Machine learning predict text fields based on text fields - machine-learning

I am working on machine learning and prediction for about a month. I have tried IBM watson with bluemix, Amazon machine learning, and predictionIO. What I want to do is to predict a text field based on other fields. My CSV file have four text fields named Question,Summary,Description,Answer and about 4500 lines/Recrods. No numerical fields are in the uploaded dataset. A typical record looks like below.
{'Question':'sys down','Summary':'does not boot after OS update','Description':'Desktop does not boot','Answer':'Switch to safemode and rollback last update'}
On IBM watson I found a question in their forums and a reply that custom corpus upload is not possible right now. Then I moved to Amazon machine learning. I followed their documentation and was able to implement prediction in a custom app using API. I tested on movielens data and everything was numerical. I successfully uploaded data and got movie recommendations with their python-boto library. When I tried uploading my CSV file The problem I had was that no text field can be selected as target. Then I added numerical values corresponds to each value in CSV.This approcah made prediction successful but the accuracy was not right. May be the CSV had to be formatted in a better way.
A record from the movielens data is pasted below. It says that userID 196 gave movieID 242 a two star rating at time (Unix timestamp) 881250949.
196 242 3 881250949
Currently I am trying predictionIO. A test on movielens database was run successfully without issues as told in the documentation using recommendation template. But still its unclear the possibilities of predicting a text field based on other text fields.
Does prediction run on numerical Fields only or a text field can be predicted based on other text fields?

No, prediction does not only run on numerical fields. It could be anything including text. My guess is that the MovieLens data uses ID instead of actual user and movie names because
this saves storage space (this dataset is there for a long time and back then storage is definitely a concern), and
there is no need to know the actual user name (privacy concern)
For your case, you might want to look at the text classification template https://docs.prediction.io/demo/textclassification/ . You will need to model how you want each record to be classified.

Related

Is there a way to generate a phrase based on two column entries?

I’m not sure how to go about this or if I’ll even explain it correctly, so bare with me.
I’m trying to make a spreadsheet for rotational play in The Sims 4 so I can maybe have fun with the game again.
I was curious if there was a way that if I had two columns with data validation from two other sheets in the main document, if there was a way for me to be like “okay so astronaut - smuggler branch” is in one column, and I have the level, let’s say 8 also in a data validation in the column next to it, if there is a way to generate “job title for level”
I have no idea lol I’m kind of high and at a loss for words ᕕ( ᐛ )ᕗ

Tableau - Dynamic Datasets

I'm not sure what what best way would be to describe the problem I'm trying to solve.
Basically, my datasets are a model output which are generated in the same format on the daily basis.
I have build a dashboard around one dataset but want to create a dynamic filter which check for the output files in a folder and update visuals for the dataset I select.
I can create data connections for the existing datasets and that will make it work but since the datasets get updated on daily basis, is there a way to create such a filter?
I don't know how to let the user select any arbitrary data source without knowing the choices in advance. Maybe there is a way. But maybe you don't need to do that.
Suppose you generate new datasets each day and they are always called by the same names, such as data-monday, data-tuesday, data-wednesday, etc. So you always have exactly the same seven datasets to pick from.
Then each dataset could have a field in it corresponding to the name, such as "WHAT-SET" with values, say, "monday", "tuesday', "wednesday", etc.
Then your data step could import all seven data sets and UNION them together, and your user could use a parameter to filter on "WHAT-SET" to pick the desired one.

How can I render time-series data on a geographic display in grafana?

My goal is to render time-series data from set locations on a map. Essentially, I have about 30 predefined (static) locations in Switzerland from which I will be receiving real-time data. The data itself is relatively simple, just the signal/noise ratio of the signal we're receiving, which should be updated every few seconds or every minute. I am using InfluxDB as my database. Are there any specific setups I should be using for this kind of visualization?
My first question is: is it best to use the worldmap panel or the geomap panel at this time? I seem to be finding more information/documentation on the worldmap panel even though i have also read that geomap is (or at least will be) its replacement.
Second, I assume that since I'm using time-series data, that I should be using the Time-Series format, and not the Table format. However, I have not been able to render any data points using the time-series feature, even by following the simplest of examples in your documentation. The best I can do is use the Table feature, and internally remove previous points from my database at every iteration (so that multiple points aren't rendered at the same time for each location). Here are two screenshots of when I'm able to render data on the geomap using the Table format, and then after switching to Time-Series format that the points are no longer there (note that I have the same problem with the Worldmap application as well).
I'm able to render data using the Table method:
...but not using time series:
Thanks for any help!
For rendering timeseries data on the geomap, you must convert your lat/long fields to a single geohash field. You'll have to do that prior to inserting the lat/longs into influxDB
See this answer

Adding one more feature to my feature set which has no effect in calculation and act as a distiguishable feature

I have a of problem as follow:
I have 10000 thousand of tweets and and I have some features which are labeled 1 or 2 . I wanna add another feature but the problem is exactly here:
I want to give each tweet (a feature)a unique id according the user who posted that tweet, so if a user posted 3 posts these 3 posts will get the same id since the same user posted them this way I can be more sure in classifying these tweets in the same group since I can claim that if most of these tweets are assigned the same label the newly coming tweet will be more probable to be in the same group.Also I am using decision tree and naive Bayes now my question is that does it make sense to do so since this feature is not numeric and has no effect in calculation and it acts as a dummy feature which is just used for distinguishing tweets?

Creating a data visualization site with Rails

I have a very large excel spreadsheet that consists of a user name, a location, a date, and some fields of numbers, for example.
User,location,date,value1,value2,value3
Steve,NYC,2012,9,1,3
Steve,NYC,2011,3,3,2
Steve,CA,2011,1,2,0
Michael,CA,2012,10,3,2
Michael,CA,2011,10,2,0
How would I go about organizing a rails site such that one can view all the values for a certain user?
For example,
/users/steve/all
would display all the values in descending order of date where user=steve.
/users/steve/nyc
would display all the values in descending order of date where user=steve and location=nyc.
I think I would need to create a users model and import all the data from the excel into the database, but I'm lost about how to do that.
The application, in essence, would be a simple data visualizer. Maybe I have to separate the database and create a user has_many :locations and locations :belongs_to user, I'm not sure. I want the data to be viewed in all sorts of ways—maybe I want to display all the users from a certain location, or view all the locations of a certain user, etc...
I suggest setting up your model within your rails application first. Then, you can just write a rake task probably similar to this question or you can build it from scratch. There's also a railscast.
If you need to directly import from Excel (e.g. the excel sheets are uploade by a user). You can use one of the following gems to importing data from an Excel Sheet
roo Reads new and old excel format (+others)
spreadsheet Reads and writes old exel format
If you only have this one excel sheet it will be far easier to simply export the data to csv and follow the answers given in the stackoverflow question mentioned above.
As for the second part of your question on how to model your database you already had the right idea.
Easiest is to fully model what your domain looks like and transform the data accordingly.

Resources