Does random forest machine learn support string character? - machine-learning

I have some data like below:
username, password, valid
kramer, abcd1234, 1
dan,123123123,0
as you can see the character of the data can be string. So scikit-learn.RandomForestClassifier return error like
ValueError: could not convert string to float: 'hEZ7P|N*Akem'
I am considering two solutions.
change string to float since they can be represented by ASCII
find another algothrim which support string feature.
Which one is better? Can you give some suggestion?

I think you would need to encode the categorical features with something like one-hot encoding since you need numerical representation. Check this answer out for more
https://datascience.stackexchange.com/questions/5226/strings-as-features-in-decision-tree-random-forest

Related

FME change data format from string to numeric

Struggling with a really simple problem; I need to convert attribute from string to numeric in FME. have tried using the arithmetic editor, but every time I export to GIS I get string. It seems when one uses the statistics calculator you get numeric.
Any ideas? As I am all out of them.
Ashton

convert hash value into string in objective c

I have a hash value and I want to convert it into string formate but I do not know how to do that.
Here is the hash value
7616db6c232292d2e56a2de9da49ea810d5bb80d53c10e7b07d9521dc88b3177
Hashing technique is a one way algorithm. So that is not possible, sorry.
You can't. A hash is a one-way function. Plus, two strings could theoretically generate the same hash.
BUT. If you had a large enough, pre-computed list of as many hashes as you could generate for each unique string, you could potentially convert your hash back into a string. Although it would require thousands of GB in entries and a lot of time to compute each and every hash. So it practically isn't possible.

How to extract differing part from many strings?

I have many strings. At least 4, max 12. All the Strings have a differing part in the middle. I want to extract the part of the string for each string that differs from all the other strings. How can I practive this?
http://users.cybercity.dk/~dsl8950/ruby/diff.html
That's all what you need

What's the characterset of SHA1?

I need to know what character will the SHA1 will generate for me?
Is it possible to know the characterset of the SHA1? Or if it's configurable, what's the default characterset of it?
Thank you.
SHA-1 doesn't generate text, it generates a binary hash (like most digests), so it doesn't have a charset (or care about the input's charset for that matter).
You can represent it as text (a string representation of the hex value, and base64 are popular) if you want, especially if you need to transfer it over the network or display it to users. That encoding is up to you.
I'm fairly sure it's just binary data rather than any character encoding. You could then encode that in Base64 if you like.
The hash algorithm SHA1 takes a stream of bytes as input, and calculates the 160-bits digest. Command line versions output the digest as a hexadecimal string. No charsets involved.

Convert SHA1 back to string

I have a user model on my app, and my password field uses sha1. What i want is to, when i get the sha1 from the DB, to make it a string again. How do i do that?
You can't - SHA1 is a one-way hash. Given the output of SHA1(X), is not possible to retrieve X (at least, not without a brute force search or dictionary/rainbow table scan)
A very simple way of thinking about this is to imagine I give you a set of three-digit numbers to add up, and you tell me the final two digits of that sum. It's not possible from those two digits for me to work out exactly which numbers you started out with.
See also
Is it possible to reverse a sha1?
Decode sha1 string to normal string
Thought relating MD5, these other questions may also enlighten you:
Reversing an MD5 Hash
How can it be impossible to “decrypt” an MD5 hash?
You can't -- that's the point of SHA1, MDB5, etc. Most of those are one-way hashes for security. If it could be reversed, then anyone who gained access to your database could get all of the passwords. That would be bad.
Instead of dehashing your database, instead hash the password attempt and compare that to the hashed value in the database.
If you're talking about this from a practical viewpoint, just give up now and consider it impossible. Finding the original string is impossible (except by accident). Most of the point of a cryptographically secure hash is to ensure you can't find any other string that produces the same hash either.
If you're interested in research into secure hash algorithms: finding a string that will produce a given hash is called a "preimage". If you can manage to do so (with reasonable computational complexity) for SHA-1 you'll probably become reasonably famous among cryptanalysis researchers. The best "break" against SHA-1 that's currently known is a way to find two input strings that produce the same hash, but 1) it's computationally quite expensive (think in terms of a number of machines running 24/7 for months at a time to find one such pair), and does not work for an arbitrary hash value -- it finds one of a special class of input strings for which a matching pair is (relatively) easy to find.
SHA is a hashing algorithm. You can compare the hash of a user-supplied input with the stored hash, but you can't easily reverse the process (rebuild the original string from the stored hash).
Unless you choose to brute-force or use rainbow tables (both extremely slow when provided with a sufficiently long input).
You can't do that with SHA-1. But, given what you need to do, you can try using AES instead. AES allows encryption and decryption.

Resources