Products category detection using weka - machine-learning

I'm working on an ecommerce application. Most of the products i have contains the category attribute, but some do not (about 70-30%). I was trying to use Weka to detect the category, but the attributes i have are strings (name, brand, price, description, category) so all classifiers are not working as it need the attributes to be numeric, nominal, or binary.
Did any one faced such problem before ?

just make discrete continuous attributes and then it will work, because some of the algorithms does not work with continues values.

Use "StringToWordVector" filter that will convert your string attribute(s) to numeric attributes.

Related

Rails - How to Model Dynamic Form Fields?

I have a request model. A request has one classification. What I want to set up is to store a bunch of form fields in the DB. Their types, names etc. Different classifications will have different form fields for the user to fill out on a request form. So ultimately User creates new request with classification C, and they are presented with a form with the appropriate fields for classification C.
I would like the values stored in a table with the request. My question is how should this be modeled?
Request has one classification.
Classification has_many requests.
I'm just not sure what to do with the dynamic form fields. I would like to be able to create the fields and attach them to the classification. So if first name, last name are fields needed I wouldn't have to create them for every classification. Just create them once and set associate them with a classification through a join table.
Looking for advice on how to model this out and be able to easily reference them from a request.
Thanks! Any info or thoughts are appreciated.
I would say that you should first try to model it according the relational model as far as possible.
# beware of potential conflicts with this name as it clashes with core method in controllers
class Request < ApplicationRecord
has_many :classifications
end
class Classification < ApplicationRecord
belongs_to :request
end
Model everything you know you can normalize. It's usually more then you think.
Dealing with data that doesn't adhere to a fixed schema can then be dealt with a few ways:
Just define all the fields and live with a few nulls here and there.
The Entity–attribute–value (EAV) pattern. This classic approach consists of a separate table where each row represents a value for a classification eg rails g model ClassificationAttribute classification:references attr_name attr_value. This is largely made obsolete by JSON data types.
A JSON/JSONB column. This additional column would be used to shove any unstructured data that cannot be normalized.
Serialized data columns. This also made obsolete by JSON/JSONB.
All of these can be combined with the Single Table Inheritance pattern.
If classification can be broken down into a limited number of variants you could consider Multiple Table Inheritance where you store the base data in the classification table and then use separate tables for the more specific data. Rails delegated_type feature can be used for this.
Your question is really confused and it is hard to understand what you are trying to achieve. But a few remarks:
You say "Request has one classification. Classification has_many requests" But if Request has one classification. Then classification should belongs to request. This way The Classification model holds a field called request_id (foreign key) that will help ActiveRecord link the two models together. (The child model is the one holding a foreign key)
If each is the parent of the other (has_one or has_many), then where is the foreign key ?
dynamic fields is not something possible. Your databse if hard coded: each field is declared in the relational database and Rails ActiveRecord's allows to access it easily and validate it. There is indeed a solution: have one of the model holds a JSON or JSONB field. And the value instead of being of the common types: string, text, integer.. be of JSON type and holds a value that is converted to a hash by Rails :
{
first_name: "Arthur",
last_name: "Smith",
age: "23"
}
This is pretty convenient for shopping carts as you can save an actual list of items rather than an association. Having an association would need to version your items changes (when the price of an item changes for example) which need some good engineering.
The question is : is it what you really want to do ? Because this is an option that doesn't fit all apps or uses.
Also you say the request depends on the classification. I have mentionned the problem of the foreign key above. But it seems weird that one of your record behavior is set by a direct relationship relationship. Who creates the classification ? Is it one of the app models such as the User ? an Admin ? or is it seeded by the app creator (then Classification is a standalone model) ? In this case the classification preexists the request the Request and maybe a has_and_belongs_to_many association (a join table ) would fit better...
Maybe give us a clearer view of what you want to achieve with real life examples so we can help further

How to use external regressors for training Arima_PLUS model in BigQuery?

I created a model on big query, Is it possible to include additional columns as external regressors ?
For example I'd like to include Date, Users, page per session, bounce rate etc. for forecasting users.
create or replace model bqml_tutorial.create_model
options
(model_type= 'ARIMA_PLUS',
time_series_timestamp_col='Date',
time_series_data_col='Users',
auto_arima=True,
data_frequency = 'AUTO_FREQUENCY',
decompose_time_series= True)
as
select Date, cv as Users from `bqml_tutorial.cvrate` ORDER BY Date
Looking at the documentation this is currently not available. The ARIMA_PLUS model that you can train in BigQuery already does a lot of things (seasonality study, outlier removal, missing data interpolation, etc). But in terms of external regressors, you cannot add specific columns for training your model.
The only additional data you can put into your model is the holiday information (with HOLIDAY_REGION option). Which is already awesome !
Note that you can train models for multiple time series at the same time by specifying the columns you want to forecast with the TIME_SERIES_ID_COL parameter. But that will get you forecasts for all these columns from independant models (therefore the effect from one column on the other will not be modelled).

Rails model with multipe values for a field

I have a model Movie. That can have multiple Showtimes. Each Showtime is a pair of start and end times. Movies get saved in the database.
So although a Movie might have_many Showtimes, does that really need to be a model, or just a class, or some kind of custom tuple-like type?
I have seen where you can have a field with an array of values, but this would not be basic values as each value is a pair of times.
What is the best way to achieve this?
Showtimes should be a model, yes. Here are a few reasons:
Most relational databases don't natively support a tuple or array type.
What if you want to query movies occurring at a particular time? This would be difficult to do with a custom field, but would be relatively trivial with a separate table.
Most importantly, it enables better flexibility and extensibility through decreased coupling. For instance, does a showtime always exist exclusively to a movie? What if you want to extend your schema to add theatres where each theatre has many showtimes?

Best way to structure database objects with multiple features and attributes in Rails

I have a product model. Each product has a different feature set, and has many features.
Instead of creating a product model that lists all of it's features (since this would involve including a lot of features I do not need, and when I needed to add new features it would be difficult) my thought was to create one column that stores a "features array". So, be it this product is a laptop, and I wanted to know the screen size, I could call:
#laptop.features[:screen]
=> "15.6 inch"
The problem with this that I am not sure there is a simple and practical way to build a form that could accept various features, then map them to the array.
I found a railscast (#196) that explains there is accepts_nested_attributes_for built into rails that would basically have using both a Product model and Feature model and just associate the two records.
Which way would be better? Is there a common approach for this sort of problem? And is there a way to have a form in your view that would accept features? (even if they are not directly apart of the Product model's database structure)
I would definitely go with a more flexible solution of having a has_many relationship with features. Then you can easily call #product.features to get the products features and the flexibility really shines when you want to do something like assign multiple attributes to screen. If you are throwing hashes into your database you wouldn't be able to add two attributes (easily anyways) to screen.
Say you wanted #product.features[:screen] to show IPS of TFT in the future as well as size, then you would have to have nested hashes or something else that would be really ugly to process.
Perhaps a Features table that contains the features that you want to mention for the products, probably with a type attribute. Maybe "Type: Display, Key: Size, Value: 1920x1080", or "Type: HDD, Key: Capacity, Value: 2GB". You can use the type to create 'families' of keys. You can make your keys anything you want to track, and the value is just a string.
With that Feature list built, you create a joining table/model (assignments?) that tracks which product has which features.
Product (id, non-feature attributes)
has_many assignments
has_many features, through assignments
Feature (id, type, key, value)
has_many assignments
has_many products, through assignments
Assignments (id, product_id, feature_id, timestamps?)
belongs_to product
belongs_to feature
Given that you're linking a product id to a feature id, you can fiddle with your feature value text without breaking anything. Decide that "1920x1080" should be "1920px x 1080px" -- just change the feature record.

Rails way for multi product types shopping cart

Im designing an application which manages the renting of lots of different equipment. And I am wondering whats the best way to design the models for the application. My software has to manage lots of different types of equipment (with data types) for example:
Speaker
Make - String
Model - String
Wattage - Integer
Price - Decimal
Light
Make - String
Model - String
Wattage - Integer
Price - Decimal
Microphone
Make - String
Model - String
Use - Choice of: Instrumental, Vocal, Versatile
Price - Decimal
Cable
Length - Decimal
Connector 1 - String
Connector 2 - String
Price - Decimal
Stand
Type - Choice of: Microphone, Speaker
Height - Decimal
Boom - Boolean
Price - Decimal
Ways I have thought about the design:
An individual model for each type of product then a polymorphic association in the cart so that it can handle all the types of equipment.
A single product model that has fields for all types of equipment with a type field which can be checked when ever the product is used.
A product model with a price attribute then every type of product extends that model.
But what is the best way in rails to handle these different types of products?
The Dynamic Attributes gem should allow you to do this automatically:
https://github.com/moiristo/dynamic_attributes
There may be better gems that do what you need, but this is the first I found.
If you're using Postgres as your database, then you can use hstore. There are gems to work with hstore. If you can afford, get a subscription to railscast and watch the screencast about implementing hstore.
Activerecord-postgres-hstore seems to be the go to gem for this.
I'd personally go with a single model Product and another model called ProductAttribute.
In this table, you'd have a name column and a value column.
This way, you're not limited by your schema. A product has n product_attributes, named dynamically. You can in the admin section develop shortcuts so if you create a microphone product, it'll automatically create the specific attributes names in the linked table. You'd just have to input the values.
This way, your application is fully able to sell any sort of produts with any amount of attributes. No need to code again when in 3 months the manager will want to add another type of product :)
Edit : And of course, you'd have a ProductType model to manage all the different product types you can sell.
Another option would be to make a product attributes table, and build each product type over an admin interface instead of in low-level code. That way you would not need to alter te application to sell new products.
This is a problem that has caused headaches to many vendors of ERP solutions before.
The most elegant solution I would suggest to you based on what I've seen at one such vendor is this.
You define 4 models:
Equipment, EquipmentType, Characteristic, Choice.
There would be a many-to-many relationship between Equipment and Characteristic, going through EquipmentType.
The Characteristic model has an attribute called "value_type" and also one attribute for each value type you have (String, Integer, Decimal, Boolean).
Finally, there would be a one-to-many relationship between Characteristic and Choice.
This is actually a watered-down version of that vendor's implementation which is suited to your particular requirements.
That vendor's actual implementation is actually built at one or two levels of abstraction above what I'm showing you, in order to make the solution more generic. But those people are well-known for over-engineering things.
HTH.
The third approach is pretty close the right one. You will definitely want to abstract out all of the universal parameters for the items (such as store ID, and, as you mentioned, price) into the base model that every other item will extend. Then, as you mentioned in your first proposed solution, you will have references between the rest of the item classes where necessary, using :references.
As for the "type" and "use", you will probably be best off using a one to one relationship with the parent model. Then, store a list of possible field types for each of the models (for example, for Stand, something like possible_uses = "Microphone, Speaker"). Finally, do server-side validation when the model is instantiated that ensures that it's of a valid type. You can also do some hacks that will allow you to see make sure that Microphone and Speaker are the only two possible "uses" that your code actually uses.
A completely different, but cleaner way to do this would be to do everything I mentioned in the first paragraph, but continue the inheritance down to the lower levels. Specifically, have Microphone extend BaseItem, give Microphone the Make and Model parameters, and then have models InstrumentalMicrophone, VocalMicrophone, andVersatileMicrophoneextend theMicrophone` class. This will be the cleanest and will allow for full functionality.

Resources