XGBoost decision tree machine learning - machine-learning

I got this fig when I used the XGBoost regressor (the decision tree has too many details & not clear)for a large dataset (3 MB)
enter image description here
Please what is the solution?

Related

Extract knowledge from a given dataset

I have two issues:
1- How to extract knowledge for example predicats from a given dataset (i.e public dataset that are used to train neural network s)
2- how to input this knowledge in a neural network (i.e. the neural network receives the knowledge e g. Predicats with the features of the dataset in the input layer)
Any ideas ?
Thank you in advance.
I m trying to extract knowledge (predicats) from a dataset and then training the neural network by the features of the dataset and those extracted predicats.

Keras deep learning sentiment analysis - supervised or unsupervised

I am a bit confused because about the topic deep learning.
My question: Let's assume that we've got a task to solve. Reviews should be classified where they are positive or negative by usage of Keras deep learning model.
Now: Does this task belong to supervised or unsupervised learning? Why? And how does deep learning and neural network work here? How do they learn? Isn't it better, if a machine learning algorithm is being used for this task?
Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.
(Definitions from wikipedia and mathworks)
There are already labeled datasets (with the actual reviews for each input) for the task you mentioned, hence you can always model it as a supervised learning problem and use a machine learning model such as SVM, Random Forest, or MLP to solve the task.
https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data
https://www.kaggle.com/snap/amazon-fine-food-reviews
https://www.kaggle.com/jessicali9530/kuc-hackathon-winter-2018
https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews
https://www.kaggle.com/utathya/imdb-review-dataset
https://www.kaggle.com/datafiniti/hotel-reviews
https://www.kaggle.com/sid321axn/amazon-alexa-reviews
https://www.kaggle.com/bittlingmayer/amazonreviews
https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

loading even layers of pre-trained BERT for classification

I'm using the transformers library of HuggingFace. As far as I know changing the number of hidden layers in the config file leads to loading the first x layers of the pre-trained BERT. I want to load the even layers (or the last x layers) of the pre-trained BERT and then fine-tune them for a classification task.
An example for classification tasks can be found here : run_glue.py
Thanks in advance

How to use doc2vec embeddings as an input to a neural network

I'm trying to slowly begin working on a Twitter recommender system as part of a project, which requires me to use some form of deep learning. My goal is to recommend other tweets based on the topical content of a tweet with unlabelled data.
I have pre-processed my data and trained a few variations of models in doc2vec to get both word embeddings and document embeddings. But my issue is that I feel a little lost with where to go from here. I've read that doc2vec can be used as an input to a deeper neural network for training such as an LSTM or even a CNN.
Could anyone help me understand how these document embeddings (and word embeddings, I trained the model on DM mode) are used as input and what the purpose of the neural net would be in this case, is it for clustering? I understand the question is a little open-ended but I'm quite new to all this, any help would be appreciated.
If you have trained a d dimensional doc2vec for each document that will become the input vector for that particular tweet. If you have n number of documents, it will become n*d dimensional matrix. Now, this matrix can be given to the neural network. LSTM and CNN models are all used for supervised learning problems (where you have labeled data).
If you dont have labelled data, then go for unsupervised learning. Clustering comes under this! You can run different clustering algos and recommend based on this.

Weka Decision Tree getting too big (out of memory)

For classification I used Weka's J48 decision tree to build a model on several nominal attributes. Now there is more data for classification (5 nonimal attributes) but each attribute has 3000 different values. I used J48 with pruning but it ran out of memory (associated 4GB). With a smaller dataset, I saw in the output, that J48 keeps all leaves with no instances associated with it. Why are they kept in the model? Should I switch to another classifcation algorithm?

Resources