is one hot encoding is free of the dummy trap [closed] - machine-learning

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
there is a thing called dummy trap in one hot encoder that is when we encode the categorical column with 3 categories lest say a,b,and c then with one hot encoder we get 3 categories like or columns a, b ,and c but when we use get_dummies we get 2 columns instead a, and b then it is save from dummy trap. is one hot encoding exposed to dummy trap or it takes care of it . am i right? which one is save of dummy trap? or is it ok to use both with our removing columns, iam using the dataset for many algorithms.
looking for help . thanks in advance.

OneHotEncoder cannot process string values directly. If your nominal features are strings, then you need to first map them into integers.
pandas.get_dummies is kind of the opposite. By default, it only converts string columns into one-hot representation, unless columns are specified.

Related

How to generate the same random sequence from a given seed in Delphi [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I am wanting to generate the same random sequence (of numbers or characters) based on a given "seed" value.
Using the standard Randomize function does not seem to have such an option.
For example in C# you can initialize the Random function with a seed value (Random seed c#).
How can I achieve something similar in Delphi?
You only need to assign a particular value to the RandSeed global variable.
Actually, I'm almost surprised you asked, because you clearly know of the Randomize function, the documentation for which states the following:
The random number generator should be initialized by making a call to Randomize, or by assigning a value to RandSeed.
The question did not mention thread safety as a requirement, but the OP specified that in comments.
In general, pseudorandom number generators have internal state, in order to calculate the next number. So they cannot be assumed to be thread-safe, and require an instance per thread.
For encapsulated thread-safe random numbers, one alternative is to use a suitably good hash function, such as xxHash, and pass it a seed and a counter that you increment after each call in the thread.
There is a Delphi implementation of xxHash here:
https://github.com/Xor-el/xxHashPascal
For general use, it's easy to make several versions in 1, 2 or 3 dimensions as needed, and return either floating point in 0..1 or integers in a range.

how to remove irrelevant features in document classification from Weka? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
in Weka, text classification have a lot of features after applying feature selection how to remove irrelevant features in process tab quickly not one by one since in text classification the number of feature is high and it needs time to remove one by one.
Use the Remove filter for removing ranges of attributes in the Preprocess panel.
But instead of just post-processing the data, you could also change the default parameters of the StringToWordVector filter to produce more meaningful output:
change the minimum term frequency (option: -M, property: minTermFreq)
use a stopwords handler (option: -stopwords-handler, property: stopwordsHandler) like WordsFromFile.

What is the rule for multiple methods(?) on and object (i.e. num.to_s.chars.map{|x| x.to_i**2}.join.to_i)? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
What is the structural rule of something like this? I'm newer to programming and I don't know the technical term for the ".something's" (methods?).
But, in this example, there are 5 (to_s, chars, map, join, and to_i).
num.to_s.chars.map{|x| x.to_i**2}.join.to_i
Basically, all I am wondering is, what is the structure to building these? I've tried doing some similar and have received errors. So, is there a specific order or structure to these? And is the correct term method?
Ideally you should first get fundamental of ruby language. Ruby is one of the easiest language to get hold on. Checkout https://try.ruby-lang.org and you will better understand following.
It's an expression where there is chain of methods being called on the result of each expression.
Assuming num is an integer, see the comment below
num
.to_s # to_s on any ruby object converts it to string
.chars # returns individual characters in string array
.map { |x| # iterates over each number character in array
x.to_i**2 # and convert each character to integer and sqare it( ** is exponent operator)
}
.join # map returns new array and join/conctenate each number
.to_i # convert it back to integer
so if num is 123, it returns 149 which essentially each number is squared.
You can try yourself by running this code one by one in irb

Why do we check if empty before we move? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I see a lot of codes like this:
if WA > SPACES
move WA to WA2
end-if
What are the advantages that this is checked? Is it more efficient to check if it is empty than to just move it anyways?
Additional information:
WA and WA2 can be simple structures (without fillers) but also just simple attributes. They are not redifined and typed as chars or structures of chars. They can be either low-values (semantically NULL) or have alphanumeric content.
Nobody can tell you what the actual reason is except for the people who coded it, but here is a very probable reason:
Usually this is accompanied with an ELSE that would cover what happens if the value is less than spaces, but in this case I would assume that what every happens to this data later is relying on that field NOT being LOW-VALUES or some funky non-displayable control character.
If I had to guess, I would assume that WA2 is initialized to spaces. So doing this check before the move ensures that nothing lower than spaces would be moved to that variable. Remember, less than spaces does not mean empty, it just means that hex values of that string are less than X'40' (so for example is the string was full of low values, it would be all X'00'. So I would guess that its more about ensuring that the data is valid than efficiency.

Structuring data in a conversion app [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In an application I need to convert values between metric, British imperial and US imperial, e.g. kilos to stones + pounds (UK) to pounds (US). How best to store the user-inputted data?
Is it better to convert all inputted values to e.g. metric and save as a float, or keep the user inputted data as say a literal string and interpret on each application launch?
The maths/equations etc is all good, it's more knowing what the most efficient structure is for storing values that can be represented in different ways, in a database?
It really comes down to what precision you need. Storing the value as String might be safe but is extremely inefficient. For simple 1 to 1 value conversions it might be efficient enough. For converting thousands of values it probably won't.
I would go with scalar types representing the most complex value the user is able to enter manually. Derive calculations from those original values to avoid losing complexity.
One note: since you're dealing with real world values (I presume), ditch the sign and use the unsigned variants if you're going with scalar value types.
This is the approach I use, Make an NSObject called WeightObject that have 2 propertiesL
1- the value the user entered (for example 3)
2- a unit the user used example:(KiloGrams/pounds etc..).
lastly save the object. this way yo keep the record exactly as user entred, and you make the method inside the object to return the value in KG or in punds etc...
so later you say float x = myWeightObject.KilosValue

Resources