Should i use a framework or self made script for machine learning workflow automation? - machine-learning

For a personal work I try to automate the workflow of my machine learning model but I face some question in the perspective of a professional approach.
At the moment I am doing the following tasks manually:
From the raw data I extract the data that interests me in a directory with the help of a third party software (to which I give in argument the parameters of the extraction).
Then I run another software, or in some cases one (or more) of my scripts (python) in order to pre-process my data which will be stored in a new directory.
Finally I provide the processed data to one of my model which returns the labeled data and that I store in a last directory.
process diagram of the previous description.
Each step (extract, pre-process and model) are always executed in the same order but I change the scripts/software parameters/model according to my needs or the comparison I need to do.
All my scripts are stored in an ordered script directory and the third party software is called from the command line from a python script.
My goal would be to have a script/software that does the whole loop by itself. As input it would take the raw data (or the directory where they are stored) and the different parameters to make the loop with the desired module (and their right parameters).
The number of module and parameter combinations is so big that I can't make a script for each one, that's why I want to build something very modular.
I can code myself my own script but I would like to have a more professional approach as if I had to implement it for a company.
My questions: In my case (customizable/interchangeable module) would it be more appropriate to use a framework (e.g. Kedro or any other) or to build it myself (because my needs are too specific)? If frameworks are appropriate which ones to choose (and why) ?
I've been researching frameworks that already exist but besides the fact that I'm not sure if they fit my needs there are so many that I'd like to spend some time on one that could help me in my future project or professional experience.
thanks you

Related

Saving different sets of values of variables with a changing structure

I have several sets of values (factory setting, user setting...) for a structure of variables and these values are saved in a binary file. So when I want to apply certain setting I just load the specific file containing desired values and these values are applied to the variables accordingly to the structure. This works fine when the structure of variables doesn't change.
I can't figure out how to do it when I add a variable but need to retain the values of the rest (when a structure in a program changes, I need to change the files so that they would contain the new values accordingly to the new structure and at the same time keep the old ones).
I'm using a PLC system that is written in ST language. But I'm looking for some overall approach for solving this issue.
Thank you.
This is not an easy task to provide a solution that is generic and works with different plc platforms. There are many different ways to accomplish this depending on the system/interface you actually want to use e.g. PLC Source Code / OPC / ADS / MODBUS / special functions, addins from the vendor and there are some more possibilities e.g. language features on the PLC. I wrote three solutions to this with C#/ST(with OOP Extensions) and ADS/OPC communication, one with source code parsing first in C#, the other with automatic generation from PLC side and another with an automatic registration system of the parameters with an EntityFramework compatible Database as ParameterStore. If you don't want to invest too much time in this you should try out the parameter management systems that are provided by your plc vendor and live by those restrictions.

Filemaker Alternatives

I'm looking for an alternative to FileMaker Pro. I've been playing with a trial for a week now.
I'm looking for a rapid application development platform for small relational databases to run on iOS and OS X
Things I like about FM
Can make reasonable looking layouts quite quickly.
Can access the database from an iPad with Filemaker Go.
Things I don't like about FM
EVERYTHING takes a half a dozen clicks. In particular constructing a script with mouse clicks is painful.
The number of modal dialog boxes is astounding. It is routine to have them layered 3 deep.
Syntax is verbose. Set Variable [ $name Value:value ] Some of the examples start to look like excel formulas. (Excel is a write only language....) Or COBOL.
Near as I can figure variable scope is either local or global. If a script calls a script, you must call it with any local variables you want it to have access to.
Debugging is very difficult in the FM Pro version.
Doesn't seem to be any provision for building a library of functions in a single file.
No clear and obvious guide to how to document your database so that it can be maintained.
No clear and obvious way to print out all your scripts.
No clear and obvious way to print out a calling tree/dependency tree.
No clear guide to best practices.
The short answer is: Despite it's shortcomings (and I'll admit it has many), FileMaker is still the best rapid-development platform for OS X and iOS (and Windows, for that matter). The closest second-place (for OS X/iOS) I can think of would be Cocoa/Cocoa Touch with Core Data with Ruby on Rails for a web interface a distant third.
Having said that, I can offer a few tips for some of your complaints:
If you're a keyboard-centric person like myself, turn on Full Keyboard Access (in the Keyboard System Preference within the Shortcuts tab). This will allow you to tab through all of the controls, such as buttons, which makes it much easier to select deep dialog options from the keyboard. For example, when building a script, you can use the tab key to focus on the list of script steps, then type a few letters of the step you want, which will highlight it, and press return, which will add it to the script. Then, while a script step in the script is highlighted, you can use Ctrl-Up and Ctrl-Down to move the step up and down in the execution order.
Script variables, both local and global, can be set within any calculation. For example, if you're capturing a primary key value to a local variable and you already have an If script step, you can do the capture within the If script step.
If[ Let( [ $record_id = Table::ID ]; not IsEmpty( $record_id ) ) ]
Similarly, if you have a number of Set Variable script steps in a row, you can combine them into one:
Set Variable[ $? Value:Let( [ $var1 = 1; $var2 = "two" ] ) "" ]
This sets the $? variable to an empty string, but has the side effect of also setting $var1 and $var2.
You're correct that variables are either local to a script (or calculation) or global to the file. If you want to share information between scripts, parameters are the solution. For my personal solution for sending multiple parameters to a script, read my article on Multiple FileMaker Script Parameters.
If you're going to do any amount of custom development with FileMaker, you really want to get FileMaker Pro Advanced, which, inaddition to a step-level debugger, offers the ability to create custom menus and, my personal favorite, custom functions. Using custom functions (which can easily be brought from one file to another), you can built a complex library of functions.
To print out all of your scripts, open Manage Scripts, select all of the scripts with Cmd-A and click the print button on the bottom right of the window.
For script dependencies, look into BaseElements, a FileMaker-based solution for documenting FileMaker systems.
While there's no standard "best practices" across the board, and because of how FileMaker organizes its objects, documentation is often found in various places (script comments, calculation comments, field comments), there are many ways to build a system in FileMaker so that you increase its maintainability. Unlike Objective-C or PHP, where you can be fairly certain where the comment for something will be (either in the declaration or at its first use), FileMaker is more flexible. The important idea behind "best practices" and documentation, in my opinion, is consistency. If you comment a field by using the field comments, always comment fields that way, don't comment calculation fields within the calculation or use dummy validation to put comments in a calculation there.
If you're looking for one guide (but not the only guide) for best practices, check out FileMaker Coding Standards. I use some of those guidelines, and others are my own that have evolved over time.
Finally, if you're looking for generally great material on how to get the most from FileMaker, check out FileMaker Magazine, published by one of the people involved with the FileMaker Coding Standards site.
The truth is, if you're coming from some more conventional development platform, FileMaker is going to take a bit of getting used to. I've been using it for over 20 years, so I'll admit it's probably difficult for me to completely empathize with that situation. But if you give it a bit of a chance, I think you'll find that there's no other platform available that can build complex database systems for OS X and iOS so quickly.
Filemaker takes a lot of getting used to, it's very different to SQL or any of the mainstream taught languages so if you have done some training you will need to re-think how to get to the same end goal.
If you are serious about it then get Filemaker Pro Advanced v14 and that should fix some of your GUI editing issues and join developer.filemaker.com and do the training course that you can download from there.
Once doing that and getting some experience you will find Filemaker is very RAD. Also there IS a way to get around any shortcomings, everything is possible in Filemaker.
As for passing multiple parameters to a script a quick and easy way to do it for 99.5% of cases is to do this:
Calling the script - In the parameters box separate your parameters with a carraige return like so: "parameter 1" & "¶" & "parameter 2" & "¶" & "parameter 3" etc.
In your receiving script use GetValue(get(scriptparameter),1) for parameter 1, 2 for 2, etc.
This technique won't work when you are trying to pass text with carraige returns but that is the exception.

External data source with specflow

I find entering the data in the feature file of specflow very painful specially when it is repetitive and large data. Can we use an external data source like spreadsheet to enter this data and then use this external datasource in the feature file?
It's theoretically possible, but probably so much effort that you wouldn't want to do it.
The problem is that the feature file is simply a human readable form. When it is saved in Visual Studio it is parsed and converted into the feature.cs file and that is the one that is compiled and used for testing.
So your process would become
edit spreadsheet
export to feature file
get specflow's VS plugin to convert to feature.cs
run msbuild
run tests via Nunit or similar
I wouldn't do this. Instead I'd focus on getting my tests to be better examples. It sounds like you are to trying to exhaustively cover every possibility. Don't come up with examples to cover every possible case, but instead cover as much logic as possible with fewer tests.

Visual statechart editor for non-programmers, with limited conditions, events and actions

I'm looking for some visual statechart-editor, for my customer. I'm building for him server application, and he needs tool to build statecarts and upload them to the servers. Ofcourse, the tool needs to have the capability to export to some readable format (such as SCXML), so I could build a reader for it.
I saw some tools, like fsm-editor. But they can't be good for me, because I want to limit my customer to set of specific set of parametrized-conditions, parametrized-events and parametrized-actions.
For example, I'll define:
conditions: coIsDoorOpen, coIsThereNAppelsOnTheTree(n as uint[0..200]), ...
events: evLightOn, evLightOff, evTimeout(ms as uint[1..10,000]), ...
actions: acSetAlarmOn, acCloseWindowN(n as uint[1..10]), ...
and my customer could build some dozens statecharts with those explicit predefined attributes (conds, events & actions), and upload the export of them to the approperiate places.
There is no need to be strict to one statechart-standard or to another. But I need support on this things:
parametrized conditions/events/actions
before entering/exiting state actions
no need to support inner variables; I can use actions&conditions for it.
Is there any tool for it (preferably free)?
If not - is there any OpenSource (C# / JS) implementation of editor that supports all abpve without the stricting of conds/events/actions, that I could easily break in to it and add the requested strict mode?
Based upon your needs, my knee-jerk reaction of recommending Visio or Dia would be inappropriate here. You appear to require a tool with some form of an API or descriptive language to lock users to a constrained set of components Lemmings-style, and your needs would best be serviced by something relatively simple if possible.
I'm curious why altering the source code to SCXML-GUI (fsm-editor) or Violet would not solve your needs, however. You seem to indicate that an open source utility written in C# or JavaScript is most desirable, which I cannot easily locate.
But, in the interests of completeness, here's a comparable question that may help your search. Most notably, this appears to be exactly what you desire and may be worth purchasing.
Best of luck with your project.

Managing Team Development with SSAS, TFS, & BIDS

I am currently a single BI developer over a corporate datawarehouse and cube. I use SQL Server 2008, SSAS, and SSIS as my basic toolkit. I use Visual Studio +BIDS and TFS for my IDE and source control. I am about to take on multiple projects with an offshore vendor and I am worried about managing change. My major concern is manging merges and changes between me and the offshore team. Merging and managing changes to SQL & XML for just one person is bad enough but with multiple developers it seems like a nightmare. Any thoughts on how best to structure development knowing that sometimes there is no way to avoid multiple individuals making changes to the same file?
SSIS, SSAS and SSRS files are not merge-friendly. They are stored in an xml file that is changed drastically - even with minor changes (such as changing a property) - so it becomes really impossible to merge.
So stop thinking about parallel development on one file. You need to think how you can achieve that people are not need to do parallel development on one file. So start with disabling the multiple checkout of a file. You might even want to consider to enable the option to get the latest version on a checkout.
Then start thinking how you can achieve that people can work independent. This is more in the way you structure the work and files:
Give people their own area they can work on. One SSIS package is only developed by person X at any given moment in time.
Make smaller files so the change that two people need to work in the same file is small.
I have given feedback to the product team of the imcompatability of BIDS to merge. It is a known issue, but will be hard to tackle. They don't know when it will be possible to really do parallel development on these files. Until then keep away from parallel development.
As Ewald Hofman mentioned, SSAS and SSIS is not merge-friendly.
In one environment I worked solved the problem as follows:
do only use SSIS when you have to (fuzz algorithm or something similar). Replace SSIS packages as often as you can with SQL code (see Linked Server for datasync. and MEARGE Command for dimension/fact-table-creating for instance).
build your data warehouse structure as follows:
build 2 databases, one for the "raw source data" from the source systems and one (the "stage" database) for the dimension and fact views and tables
use procedures that can deploy the whole "stage" database
put the structure for the "stage" database into your Repository
build a C# application that build your dimensions and cubes via the AMO API (I know, that's a tough job at the beginning but it is it worth - think on what you gain - Look at the Pros below )
add the stage database and the C# application to your Repository (TFS/Git etc.)
Pros of that structure:
you have a merge-able structure you can put in your Repository
you are using the AMO API witch has
you can automate the generation of new partitions
you can use procedures to automate and clone measure groups to different cubes (what I think is sometimes a big benefit!)
you could outsource your translation and import it easily (the cube designer is probably not the best translator)
Cons:
the vendor would probably not adapt that structure
you have to pay more (because of either higher skill requirements or for teaching him your individual structure)
you probably need knowledge over a new language C# - if you don't already have
Conclusion:
there are possibilities to get a merge-friendly environment
you will get lost of nice click-and-run tools f.e. BIDS - but will get into process of high automation functionality
outsourcing will be maybe unprofitable because of high individualization
http://code.google.com/p/support/wiki/DVCSAnalysis
maybe a better tag is DVCS?
https://stackoverflow.com/questions/tagged/dvcs
As long as both teams are using bids and TFS this should not be a problem.
assuming that your tsql code is checked in to source control in a single file per object, merging TSQL code is straight forward since it is text based. I have found the VSTS Database projects help with this.
Merging the XML based source files of SSIS and the MSAS can be cumbersome as you indicate below. to alleviate some of the pain, I find that keeping each package limited to a single dataflow or logical unit of work helps reduce developer contention on packages. I then call these packages from one or more master packages. I also try to externalize all of my tsql source queries using sprocs, view or udfs so that the need to edit the package is further reduced. using configuration files and variables also helps to a smaller extent.
MSSAS cubes are a little bit tougher. My best suggestion is to look into a 3rd party xml differencing tool. I have been able to successfully merge small changes use the standard text based tools but it can be a daunting task.

Resources