How to generate a private key for Ethereum from the mnemonic? - dart

I generated a new mnemonic using bip39 package: bip39.generateMnemonic(). There is the next step where I need to convert it into 64 characters of hex-string. I can make it using SHA-256 but it looks little weirdly because of I will apply ECDSA-256 and KECCAK-256 to generate a public key.
Is it right way to generate a private key from the mnemonic using SHA-256? Or should I use another hash-function?
P.S. I am a noobie in Ethereum system.

BIP-39 covers only the seed. You also need BIP-32 and BIP-44, as one seed generates several private and public key pairs, also known as hierarchical deterministic wallet.
You can find more information in this blog post.

There is a library called tweetNacl that have several functionalities including generating a keyPair from a seed. It is also available in dart as pineNacl here.
I have not tried the dart one myself but you should be able to generate a keyPair from a seed. You can also see conversion of mnemonic into a 32-byte seed here
After scrolling through some really confusing documentation of its dart implementation. You should be able to generate a keyPair from a seed with something like a function like this

Related

Processing multiline events from a text file in Dataflow

I am attempting to build a dataflow pipeline to process a text file which contains events that span multiple lines. The dataflow SDK TextIO class assumes each line is a new event.
My plan is to create a new TextReader and register it with the DataPipelineRunner. This new reader will know how to aggregate the multiple lines into a single line.
I am pretty sure that this approach will work but I am wondering if this is the right way to do it or if there is a simpler solution?
The text I am trying to parse is:
==============> len:45 pktype:4 mtype:2
SYMBOL: USOCSTIA151632.00
OPEN_INT: 212
PR_OPEN_INTEREST: 212
TIME_STAMP: 04/10/2015 06:30:17:420 val:1428661817
The result should be the last 4 lines concatenated together and the first line dropped.
Best regards,
Peter
Note that TextReader is an internal implementation detail class, so subclassing it would be highly discouraged and challenging to do properly.
The recommended way to define a new file-based format like yours is to subclass FileBasedSource using the user-defined source API.
In your case, I would recommend to base your class on the LineIO example from documentation, and wrap the LineReader defined there into your own class which would use LineReader as a helper for reading individual lines, but:
In startReading() it would skip until the line starting with "====>"
In readNextRecord() it would read lines until the next "====>" and bundle them into a single record.
Please make sure to carefully read the documentation to FileBasedSource and FileBasedReader: the parallelization mechanism relies on the consistency properties described there, which your format has to satisfy, for ensuring that records are not duplicated or omitted on the boundaries between adjacent processing shards. XmlSource tests are a good example of how to unit-test these properties.
Please tell us how it goes and report back with any problems or questions - we are very interested in feedback on this API.

Different coders for the same class in dataflow job

I'm trying to use different coders for the same class for two different scenarios:
Reading from JSON input files - using data = TextIO.Read.from(options.getInput()).withCoder(new Coder1())
Elsewhere in the job I want the class to be persisted using SerializableCoder using data.setCoder(SerializableCoder.of(MyClass.class)
It works locally, but fails when run in the cloud with
Caused by: java.io.StreamCorruptedException: invalid stream header: 7B227365.
Is it a supported scenario? The reason to do this in the first place is to avoid read/write of JSON format, and on the other hand make reading from input files more efficient (UTF-8 parsing is part of the JSON reader, so it can read from InputStream directly)
Clarifications:
Coder1 is my coder.
The other coder is a SerializableCoder.of(MyClass.class)
How does the system choose which coder to use? The two formats are binary incompatible, and it looks like due to some optimization, the second coder is used for data format which can only be read by the first coder.
Yes, using two different coders like that should work. (With the caveat that the coder in #2 will only be used if the system choses to persist 'data' instead of optimizing it into surround computations.)
Are you using your own Coders or ones provided by the Dataflow SDK? Quick caveat on TextIO -- because it uses newlines to encode element boundaries, you'll get into trouble if you use a coder that produces encoded values containing something that can be mistaken for a newline. You really should only use textual encodings within TextIO. We're hoping to make that clearer in the future.

Tool to manage string literals used for parsing JSON

We parse a lot of JSON in our app - without the back-end, it would be a pretty useless app. I know this goes for a bunch of other apps out there as well. In order to parse JSON, we need a list of keys to get to the data. I'd like to know what is considered 'best practice' or at least 'damn good practice' for managing these paths/string literals. Is there a tool out there that helps manage such keys and reduces duplication?
Hard-coding them is definitely not an option although to be frank, if our back-end programmers change the key, in concept, a simple find/replace in XCode (or whatever IDE you're using) would suffice. It's ugly and unclean and I just feel dirty putting string literals all over my code though.
What I'm currently doing now is putting them all into my PCH file, which means I end up with:
#define kBookmarksSearchResultsIDFieldName #"business.id"
#define kBookmarksSearchResultsNameFieldName #"business.name"
#define kBookmarksSearchResultsThumbnailURLFieldName #"business.display_image.images.small_mobile.source"
#define kBookmarksBusinessCategoryArrayFieldName #"business.categories"
This gets unwieldy real fast though since now I have around a thousand lines of these things in my PCH file.
The other option I'm considering is breaking these up into separate .h files - but then if two components of my app end up using the same key (for example, a business object is embedded into the JSON for a bookmark, or for a review of that business) then I have to import the .h that contains the JSON paths for the business object. So in this case I'm still importing all of the same data, it's just the file organization that's cleaner.
My objectives are:
Easy management of string literals used for parsing JSON
Reduce the amount of duplication needed
Easy changing/replacement of JSON paths if/when needed
Is option 3 that I listed above (separate .h files) my best option? What do you guys use, and am I missing an easy tool out there (and no, JSONModel isn't an option because of the way it requires your JSON keys to match your ivar/property names - our back-end supports a number of platforms so we can't change the JSON keys just for iOS).
Look into using a library such as RestKit which allows you to map a JSON document to a set of Objective-C classes. This means you can read the document in and get an array of objects you can manipulate by properties instead of having to keep track of key names. It's much easier, and Xcode will autocomplete your property names as you work with the classes.
It takes some setup, but you only have to do it once. :)
Just to update this answer - there's a very cool library called Mantle - not perfect, there are some issues with typecasting but still a very solid effort.

Options for MeCab Japanese tokenizer on iOS?

I'm using the iPhone library for MeCab found at https://github.com/FLCLjp/iPhone-libmecab . I'm having some trouble getting it to tokenize all possible words. Specifically, I cannot tokenize "吉本興業" into two pieces "吉本" and "興業". Are there any options that I could use to fix this? The iPhone library does not expose anything, but it uses C++ underneath the objective-c wrapper. I assume there must be some sort of setting I could change to give more fine-grained control, but I have no idea where to start.
By the way, if anyone wants to tag this 'mecab' that would probably be appropriate. I'm not allowed to create new tags yet.
UPDATE: The iOS library is calling mecab_sparse_tonode2() defined in libmecab.cpp. If anyone could point me to some English documentation on that file it might be enough.
There is nothing iOS-specific in this. The dictionary you are using with mecab (probably ipadic) contains an entry for the company name 吉本興業. Although both parts of the name are listed as separate nouns as well, mecab has a strong preference to tag the compound name as one word.
Mecab lacks a feature that allows the user to choose whether or not compounds should be split into parts. Note that such a feature is generally hard to implement because not everyone agrees on which compounds can be split and which ones can't. E.g. is 容疑者 a compound made up of 容疑 and 者? From a purely morphological point of view perhaps yes, but for most practical applications probably no.
If you have a list of compounds you'd like to get segmented, a quick fix is to create a user dictionary for the parts they consist of, and make mecab use this in addition to the main dictionary.
There is Japanese documentation on how to do this here. For your particular example, it would involve the steps below.
Make a user dictionary with two entries, one for 吉本 and one for 興業:
吉本,,,100,名詞,固有名詞,人名,名,*,*,よしもと,ヨシモト,ヨシモト
興業,,,100,名詞,一般,*,*,*,*,こうぎょう,コウギョウ,コウギョウ
I suspect that both entries exist in the default dictionary already, but by adding them to a user dictionary and specifying a relatively low specificness indicator (I've used 100 for both -- the lower, the more likely to be split), you can get mecab to tend to prefer the parts over the whole.
Compile the user dictionary:
$> $MECAB/libexec/mecab/mecab-dict-index -d /usr/lib64/mecab/dic/ipadic -u mydic.dic -f utf-8 -t utf-8 ./mydic
You may have to adjust the command. The above assumes:
Mecab was installed from source in $MECAB. If you use mecab installed by a package manager, you might have difficulties finding the mecab-dict-index tool. Best install from source.
The default dictionary is in /usr/lib64/mecab/dict/ipadic. This is not part of the mecab package; it comes as a separate package (e.g. this) and you may have difficulties finding this, too.
mydic is the name of the user dictionary created in step 1. mydic.dic is the name of the compiled dictionary you'll get as output (needs not exist).
Both the system dictionary (-t option) and the user dictionary (-f option) are encoded in UTF-8. This may be wrong, in which case you'll get an error message later when you use mecab.
Modify the mecab configuration. In a system-wide installation, this is a file named /usr/lib64/mecab/dic/ipadic/dicrc or similar. In your case it may be located somewhere else. Add the following line to the end of the configuration file:
userdic = home/myhome/mydic.dic
Make sure the absolute path to the dictionary compiled above is correct.
If you then run mecab against your input, it will split the compound into its parts (I tested it, using mecab 0.994 on a Linux system).
A more thorough fix would be to get the source of the default dictionary and manually remove all compoun nouns you want to get split, then recompile the dictionary. As a general remark, using a CJK tokenizer for a serious application in production mode over a longer period of time usually involves a certain amount of dictionary maintenance (adding/removing entries) regularly.

HMAC-SHA-512 implemention for ActionScript

As mentioned by the title, I would like to find an implementation for HMAC-SHA-512 written for ActionScript. I was able to find a library that provide HMAC-SHA-256 with other functions, however, I am looking for HMAC-SHA-512 specifically.
Thank you
Edit:
Or, since actionscript and javascript have the same origin, can some one port this javascript version into actionscript?
http://pajhome.org.uk/crypt/md5/sha512.html
Edit 2:
I already ported the code from javascript to actionscript. The code can be found in one of the answers in this question
Porting SHA-512 Javascript implementation to Actionscript
Checkout this library:
http://code.google.com/p/as3crypto/
Though only does:
SHA-256,SHA-224,SHA-1,MD5, and MD2
So I guess that doesn't answer your question.
But best Crypto library for actionscript I've seen.
The implementation you link to doesn't seem to be using any features that aren't supported by ActionScript 3. Just surround the whole thing with public class SHA512 { }, and prefix the first five functions with public.
Edit: You will also need to convert function int64 to it's own class (or possibly use Number, though I'm not sure if you will lose precision for 64-bit integers).
Just found all of SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512) implemented at http://code.google.com/p/flame/. Also it provides HMAC implementation. Didn't try it yet but looks what you're looking for.

Resources