How to get inter stock relationship using Deep Learning? - time-series

i'm trying to get relationship between stock companies based on their historical closing prices. Cross-correlation or other similarity matrices can perform this task. But i want use deep learning methods(RNN/attention) to extract relationship between companies.
So i prepared my data as follows,
input data X of shape (samples, stocks, seq_len)
and target data y of shape (samples, stocks)
I did the following,
lets say samples=32, seq_len=15, and stocks=50
model.add((LSTM(50,return_sequences=True,input_shape=(32,15,50),activation='relu')))
model.add((LSTM(50,return_sequences=False,activation='tanh')))
model.add(Dense(50,activation='relu'))
But I'm confused how to get relationship between stocks.

Related

Ideas for handling problem of multiple records for one observation?

Background
I am trying to create a model that can predict Type 2 diabetes in a patient based on MRI scans of their thigh muscle. Previous literature has shown that fat deposition in the muscle of femur is linked to Type 2 Diabetes, so there is some valid relationship here.
I have a dataset comprised of several hundred patients. I am analyzing radiomics features of their MRI scans, which are basically quantitative imaging features (think things like texture, intensity, variance of texture in a specific direction, etc.). The kicker here is that an MRI scan is a three-dimensional object, but I have radiomics features for each of the 2D slices, not radiomics of the entire 3D thigh muscle. So this dataset has repeated rows for each patients, "multiple records for one observation." My objective is to output a binary classification of Yes/No for T2DM for a single patient.
Problem Description
Based on some initial exploratory data analysis, I think the key here is that some slices are more informative than others. For example, one thing I tried was to group the slices by patient, and then analyze each slice in feature hyperspace. I selected the slice with the furthest distance from the center of all the other slices in feature hyperspace and used only that slice for the patient.
I have also tried just aggregating all the features, so that each patient is reduced to a single row, but has way more features. For example, there might be a feature called "median intensity." But now the patient will have 5 features, called "median intensity__mean" "median intensity__median", "median intensity__max," and so forth. These aggregations are across all the slices that belong to that patient. This did not work well and yielded an AUC of 0.5.
I'm trying to find a way to select the most informative slices for each patient that will then be used for the classification; or an informative way of reducing all the records for a single observation down to a single record.
Solution Thoughts
One thing I'm thinking is that it would probably be best to train some sort of neural net to learn which slices to pick before feeding those slices to another classifier. Effectively, how this would work would be the neural net would learn a linear transformation that could be applied to the matrix of (slices, features) for each patient. So some slices would be upweighted while others would be downweighted. Then I could compute the mean along the ith axis and then use that as input to the final classifier. If you have examples of code for how this would work (I'm not sure how you would hook up the loss function from the final classifier (in my case, a LGBMClassifier) to the neural net so that backpropagation occurs from the final classification all throughout the ensemble model.
Overall, I'm open to any ideas on how to approach this issue of reducing multiple records for one observation down to the most informative / subset of the most informative records for one observation.

Hybrid recommendation system with matrix factorization and linear regression

I'm following a tutorial that for creating a recommendation system in BigQueryML. The tutorial uses matrix factorization first to calculate user and item factors. In the end I have a model that can be queried with user ids or item ids to get recommendations.
The next step is feeding the factors and additional item + user features into a linear regression model to incorporate more context.
"Essentially, we have a couple of attributes about the movie, the
product factors array corresponding to the movie, a couple of
attributes about the user, and the user factors array corresponding to
the user. These form the inputs to our “hybrid” recommendations model
that builds off the matrix factorization model and adds in metadata
about users and movies."
I just don't understand why the dataset for linear regression excludes the user and item ids:
SELECT
p.* EXCEPT(movieId),
u.* EXCEPT(userId),
rating
FROM productFeatures p, userFeatures u
JOIN movielens.ratings r
ON r.movieId = p.movieId AND r.userId = u.userId
My question is:
How will I be able to get recommendations for a user from the linear model, when I don't have the user or item ids in the model?
Here you can find the full code:
https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive2/recommendation_systems/solutions/als_bqml_hybrid.ipynb
In the example you have shared, the goal is to fit a linear regression to the discovered factor values so that a novel set of factor values can be used to predict the rating. In this kind of setup, you don't want information about which samples are being used; the only crucial information is the training features (the factor scores) and the rating (the training/test label). For more on this topic, take a look at "Dimensionality reduction using non-negative matrix factorization for information retrieval."
If you included the movie ids and user ids in as features, your regression would try to learn on those, which would either add noise to the model or learn that low ids = lower score etc. This is possible, especially if this ids are in some kind of order you're not aware of, such as chronological or by genre.
Note: You could use movie-specific or user-specific information to build a model, but you would have many, many dimensions of data, and that tends to create poorly performing models. The idea here is to avoid the problem of dimensionality by first reducing the dimensionality of the problem space. Matrix factorization is just one method among many to do this. See, for example, PCA, LDA, and word2vec.

Reversing machine learning models to get particular features

I am trying to model a process. My input data includes certain features and measurements about the product. I built Random forests and Gradient boosting models in Python, and got good results. I am now trying to determine which features and measurements lead to the best product (almost like reversing an equation to get the x variable back for a particular y). How can I go about doing this?
This is basically doing the feature selection so here are some examples you could try out
Feature selection
I was using some of the below for my feature selection which ranks your features based on the spread of the data
Fishers score
F score
chi Squ
I found the above usefull.

Deep learning on tri-axial data

I have a series of tri-axial accelerometer data of dimension (N, 1000, 3), where N is the number of instances, 1000 is the length of the acceleration data (i.e. 10 seconds sampled at 100 Hz) and 3 are the axes X, Y and Z. The data is also divided into two classes, A and B, where A accounts for 95% of the data. In total I have just under 3000 instances of class B. The aim of my project is to create model to detect class B.
I have been creating a number of machine learning models (decision trees, boosted modes etc) with features obtained via signal processing and statsitics (e.g. standard deviation, mean, magnitude, area under curve etc). These models perform well, but they seem to be missing a number of events in the real world, that by eye I can distinguish. This led me to believe that my features are missing key components of the classes. I've been going down into the rabbit hole of signal processing, but to date none has been that Eureka moment.
Now I am no expert in Deep learning, but by combining the data into a single axis (i.e. taking the magnitude) gave promising results (i.e. just as good as the current models). However, again taking the magnitude removes information. So I was wondering if there is a way to use deep learning to 1. select features from the individual axes and 2. use these as input for another deep learner to perform the classification. Something like this:
My simple view of multiple axis deep learner. Here the individual axes (i.e. X, Y and Z) are fed into seperate deep learners and the outputs are then fed into a single deep learner.
Apologies for the lots of text and lack of examples, as I'm not allowed to share the data, and only looking for guidance on whether deep learning can be of help. Thanks for taking the time read my post.
Since there is no specifics in the question, the answer can only be given in general terms.
If magnitude gives good result, you can fed X, Y, Z and magnitude into a single deep learner as 4 input.
In this case, your deep learner will be able to use a) separate features of axis, b) combining the data into a single axis, c) the relationship between the axes.

Is training data required for collaborative filtering methods?

I'm about to start writing a recommender system for videos, mostly based on collaborative filtering as video metadata is pretty sparse, with a bit of content-based filtering as well. However, I'm not sure what to do about training data. Is training data something of importance in recommender systems, specifically in collaborative methods? If so, how can I generate that kind of data, or what type of data should I look for?
Any ML algorithm needs data. Take Matrix Factorization approach, for example.
It receives (incomplete) matrix of rates: rows represent users, columns represent items and a cell contains rate that particular user rated particular item. Then by factorizing this matrix you obtain latent vector representation for each user and each item, thus allowing you to predict future rates. Obviously, unseen items with highest rate are most interesting to the user, according to the model.
Essentially, Matrix Factorization learns predicting new rates for known users and items.

Resources