Is there a model packaging scheme? - machine-learning

I'm wondering if there is some kind of AI model packaging scheme similar to software packaging(npm, maven, pub, apt, etc...).
I did some search and found that there's an option for application developers to use a pre-trained model from TensorFlow Hub, Hugging Face, etc.
Generally they provide a python library that can be used in other application projects. With the tool provided, I see the following options of serving AI models.
Download the pretrained model through python library and use in python application projects.
Open an API server (REST or any other kinds) that provides model inference.
Download the model file and use any kind of machine learning libraries&frameworks of the language used by the application.
Given the options above, it makes sense to have some kind of model versioning scheme somewhere, similar to software semantic versioning or software packaging scheme.
Is it generally the job of the application developer to manage the model versioning? (Checking the diff of binary file of the model?)
I did some search online but was unable to find relevant information due to insufficient background knowledge.

Related

What is the simpest way to train pytroch-lightning model over a bunch of servers with Dask?

I have access to a couple dozens Dask servers without GPU but with complete control of the software (can wipe them and install something different) and want to accelerate pytorch-lightning model training.
What could be a possible solution to integrate them with as little additional code possible?
I've researched this topic a bit, finding possible options, cannot determine which one to choose:
#
option
info 
pro 
con 
1.
dask-pytorch-ddp
Package to be used to writing models with easier integration into Dask 
will likely work
cannot use existing model out of the box, need rewriting the model itself
2.
PL docs, on-prem cluster (intermediate)
multiple copies of pytorch lighning on the network 
simples way according to lightning docs
fiddly to launch according to the docs
3.
PL docs, SLURM cluster
wipe/redeploy cluster, setup SLURM 
less fiddly to launch individual jobs
need to redeploy the cluster OS/software
4.
Pytorch + dask
officially supported and documented use of Skorch 
has a package handling this - skorch
will need to use pytorch, not lightning
Are there any more options or tutorials to learn about this?

AWS SWF vs Flow Framework

I am familiar with the Concept of amazon SWF . I can see many SDK in different languages to use SWF services. Also, amazon Flow Framework is a set of library to implement distributed applications . Currently this Flow Framework is available in Java and Ruby . Then how can we write distributed applications using SWF in other languages like python , php etc. Does this mean amazon provides the framework in Java and Ruby only , rest of the languages have other vendor's libraries ? Please explain .
You are right that AWS currently only provides high-level frameworks for Ruby and Java ("Flow" frameworks). Low-level access to SWF is available in most (all?) official SDKs though: boto2/3 for Python, go-sdk, etc.
When using SWF, you'll find yourself implementing mainly two types of programs: "activity workers" and "deciders" (http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dev-actors.html).
Using the Flow framework is not mandatory, but it helps implementing deciders by providing high-level abstractions for describing synchronisation points, defining which tasks can be run in parallel, retries, etc. There are also non-official libraries (I'm personally maintaining one for my company, "simpleflow").
If you want to use other languages for deciders, I recommend you try to use an existing framework first, then see if you want to implement this yourself (it's not trivial from my experience).
If you want to implement activities in other languages, I recommend you start using the Flow framework end-to-end, and then you can either 1/ fork and use your favorite language as a subprocess of Ruby/Java Flow workers, or 2/ mimic the serialisation logic of the Flow framework and implement workers directly yourself with low-level APIs (which is simple: poll for an activity, do work, then respond to SWF with the result).

Moses - Online Integration

We're actually looking to integrate Moses into our localization workflow. Our application is in Java and we're looking at using Moses' functionalities using xml-rpc calls.
Specifically, we're looking at APIs for:
Incremental training (i.e. Avoid having to retrain the model
from scratch every time we wish to use some new training data)
Domain-specific training (i.e. It should maintain separate
phrase tables for each domain that the input data belongs),
Decoding
The tutorial says that these can be achieved via xml-rpc calls. But, I don't find any examples or clear ways to do them. Can someone please provide some examples?
Also, I would like to know if the training and decoding phases can be done in a
distributed manner.
Thanks!
this question is perfectly suitable for moses mailing list:
http://www.statmt.org/moses/?n=Moses.MailingLists
moses server documentation (via xml-rpc):
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc28
However, I have better experiences with: moses/contrib/web/bin/daemon.pl which makes server as well, and you communicate via tcp stream.
General examples are harder to find(everyone has different enviroment,...), but make your question more specific and send it to moses mailing list. (e.g. someone had a problem with server installation: http://comments.gmane.org/gmane.comp.nlp.moses.user/7242 )

Tools to manage semantic webs

I've seen a lot frameworks to create a semantic web (or rather the model below it). What tools are there to create a small semantic web or repository on the desktop, for example for personal information management.
Please include information how easy these are to use for a casual user, (in contrast to someone who has worked in this area for years). So I'd like to hear which tools can create a repository without a lot of types and where you can type the nodes later, as you learn about your problem domain.
For personal semantic information management on the desktop there is NEPOMUK. There are two versions, one embedded in kde4, this lets you tag, rate and comment things such as files, folders, pictures, mp3s, etc. on the desktop across all applications.
Another version is written in Java and is OS independent, this is more of a research prototype. It has more features, but is overall less stable.
For KDE-Nepomuk see http://nepomuk.kde.org/
For Java-Nepomuk see http://dev.nepomuk.semanticdesktop.org/ and http://dev.nepomuk.semanticdesktop.org/download/ for downloads (the DFKI version is better)
Extensive list of semantic web tools
Also check out Protege
If you need to create a small model, then I suggest that you use topbraid. I have used for creating much larger models and I know people who have used to create humongous models. It comes packaged with a set of reasoners and provides ability to plug-in custom reasoner and in case if you decide to make your model larger, you can even integrate Topbraid with a triple store like Allegrograph.
And since its based on eclipse, to get started with it is relatively easier.
For developers who are spoiled working in more matured programming languages like Java (IDEA ? anyone), topbraid is the closest tool to an actual IDE.
Chandler is a "a notebook you can organize, back up and share!" It seems to be pretty simple to use.
OS: Windows, Mac, Linux

Metamodelling tools

What tools are available for metamodelling?
Especially for developing diagram editors, at the moment trying out Eclipse GMF
Wondering what other options are out there?
Any comparison available?
Your question is simply too broad for a single answer - due to many aspects.
First, meta-modelling is not a set term, but rather a very fuzzy thing, including modelling models of models and reaching out to terms like MDA.
Second, there are numerous options to developing diagram editors - going the Eclipse way is surely a nice option.
To get you at least started in the Eclipse department:
have a look at MOF, that is architecture for "meta-modelling" from the OMG (the guys, that maintain UML)
from there approach EMOF, a sub set which is supported by the Eclipse Modelling Framework in the incarnation of Ecore.
building something on top of GMF might be indeed a good idea, because that's the way existing diagram editors for the Eclipse platform take (e.g. Omondo's EclipseUML)
there are a lot of tools existing in the Eclipse environment, that can utilize Ecore - I simply hope, that GMF builts on top of Ecore itself.
Dia has an API for this - I was able to fairly trivially frig their UML editor into a basic ER modelling tool by changing the arrow styles. With a DB reversengineering tool I found in sourceforge (took the schema and spat out dia files) you could use this to document databases. While what I did was fairly trivial, the API was quite straightforward and it didn't take me that long to work out how to make the change.
If you're of a mind to try out Smalltalk There used to be a Smalltalk meta-case framework called DOME which does this sort of thing. If you download VisualWorks, DOME is one of the contributed packages.
GMF is a nice example. At the core of this sits EMF/Ecore, like computerkram sais. Ecore is also used for the base of Eclipse's UML2 . The prestige use case and proof of concept for GMF is certainly UML2 Tools.
Although generally a UML tool, I would look at StarUML. It supports additional modules beyond what are already built in. If it doesn't have what you need built in or as a module, I supposed you could make your own, but I don't know how difficult that is.
Meta-modeling is mostly done in Smalltalk.
You might want to take a look at MOOSE (http://moose.unibe.ch). There are a lot of tools being developed for program understanding. Most are Smalltalk based. There is also some java and c++ work.
Two of the most impressive tools are CodeCity and Mondrian. CodeCity can visualize code development over time, Mondrian provides scriptable visualization technology.
And of course there is the classic HotDraw, which is also available in java.
For web development there is also Magritte, providing meta-descriptions for Seaside.
I would strongly recommend you look into DSM (Domain Specific Modeling) as a general topic, meta-modeling is directly related. There are eclipse based tools like GMF that currently require java coding, but integrate nicely with other eclipse tools and UML. However there are two other classes out there.
MetaCase which I will call a pure DSM tool as it focuses on allowing a developer/modeler with out nearly as much coding create a usable graphical model. Additionally it can be easily deployed for others to use. GMF and Microsoft's Beta software factory/DSM tool fall into this category.
Pure Meta-modeling tools which are not intended for DSM tooling, code generation, and the like. I do not follow these tools as closely as I am interested in applications that generate tooling for SMEs, Domain Experts, and others to use and contribute value to an active project not modeling for models sake, or just documentation and theory.
If you want to learn more about number 1, the tooling applications for DSMs/Meta-modeling, then check out my post "DSMForum.org great resources, worth a look." or just navigate directly to the DSMForum.org
In case you are interested in something that is related to modelling and not generation of code, have a look at adoxx.org. As a metamodelling platform it does provide functionalities and mechanisms to quickly develop your own DSL and allows you to focus on the models needs (business requirements, conceptual level design/specification). There is an active community from academia and practice involved developing prototypical as well as commercial application based on the platform. Could be interesting ...

Resources