How to forecast a dynamic number of time series in Watson Studio Modeler Flow - watson-studio

In Watson Studio Modeler Flow, how do I forecast time series when the number of series is dynamic?
I have studied tutorials and demos but I have only found methods that require each field's type to be manually specified. This is not feasible when there are hundreds of time series and the exact number of series changes between runs.
I need to be able to run a forecasting job even if the amount of timeseries changes (without going to the modeler to manually select a Type for each series). Since this is a trivial for-loop in Python, I expect that Modeler Flow should have some way to tackle this.
My desired flow (step 2 is the problem):
Import data from an external data source with a Connection.
Create forecasts for each individual series in the data. The number of series may vary between runs. (All the series have the same time values so they could share the time field.)
Export results back to the source system.
Help would be appreciated :)

I asked the technical team about your question. Their reply is: "Currently Modeler requires specific target field names for Time Series. It is not supported to claim “target = all fields”.
The feature is currently not supported, but you can file a suggestion here: https://ibm-data-and-ai.ideas.ibm.com/?category=6982949146944943308.

Related

Time-series charts for large amounts of data

I have a couple of thousand time-series covering several years at second-granularity. I'd like to store the data in a suitable DB (i.e. one that scales well and can retain all data at original granularity, e.g. Druid, openTSDB or similar). The goal is to be able to view the data in a browser (e.g. by entering a time frame and ideally having zoom/pan functionality).
To limit the number of datapoints that my webserver needs to handle I'd like to have functionality which seems to be working out of the box for Graphite/Grafana (which, if I understand correctly, is not a good choice for long-term retention of data):
a time-series chart in Grafana will limit data by querying aggregations from graphite (e.g. return mean value over 30m buckets when zooming out while showing all data when zooming in).
Now the questions:
are there existing visualization tools for time-series DBs that provide this functionality?
are there existing charting frameworks that allow me to customize the data queried per zoom level?
Feedback on the choice of DB is also welcome (open-source preferred).
You can absolutely store multiple years of data in Graphite, the issue you'll have is that the way that Graphite selects the aggregation level to read from is by locating the highest-resolution archive that covers the requested interval, so you can't automatically take advantage of aggregation to both have efficient long-term graphs and the ability to drill down to the raw data for a time period in the past.
One way to get around this problem is to use carbon-aggregator to generate multiple output series with different intervals from your input series so you can have my.metric.raw, my.metric.10min, my.metric.1hr, etc. You'd combine that with a carbon schema that defines matching interval and retention for each of the series so my.metric.raw is stored at 1-second resolution, .1min at 1-minute etc.
If you do that then in Grafana you can use a template variable to choose which interval you want to graph from, so you'd define a variable $aggregation with options raw, 10min, etc and write your queries like my.metric.$aggregation.
That will give you the performance that you need with the ability to drill into the raw data.
That said, we generally find that while everyone thinks they want lots of historical data at high granularity, it's almost never actually used and is typically an unneeded expense. That may not be the case for you, but think carefully about the actual use-cases when designing the system.

Time Sensitive Collaborative Filtering

I am trying to use collaborative filtering to recommend items to the user based on their past purchase. I have created a user vector representing his usage and item vector(A) with values populated as probabilty of B given A. The objective to somewhat capture the items sold together in items vector representation. Now I need to find the time when these recommendations should be presented. As the items I am recommending are of periodic use timing is very important.
So I am trying to explore constraint-based Recommendations to make my recommendation time sensitive. The approach I am thinking is to create time-sensitive constraint based on the last date of purchase and average consumption rate. But the problem is creating constraint as user level will become computationally difficult.
I need your suggestion regarding the approach or suggestion of any better way to implement the same. All I want to develop a recommendation engine using customer's usage data of items that are consumed and required to purchase again. I need to output list of recommendation as well as timing of presenting the recommendation to the user
Thanks
The way I see it, there are two basic options here that you can pursue. On the one hand, the temporal features can be incorporated as additional information and converted into a kind of hybrid recommendation. The Python package "lightfm" is a good example.
On the other hand, the problem can also be modeled as a time series problem. A well-known paper dealing with Next Basket Recommendations is "A Dynamic Recurrent Model for Next Basket Recommendation". Here too, there are already implementations on Github.

Automate Solving of customer technical issue Production L3 tickets

I want to develop a app/software which understand text from various input and make Decision according to it. Further if any point the system got confused then user can manual supply the output for it and from next time onwards system must learn to give such output in these scenarios. Basically system must learn from its past experience. The job that i want handle with this system is mundane job of resolving customer technical problems.( Production L3 tickets). The input in this case would be customer problem like with the order( like the state in which order is stuck and the state in which he wants it to be pushed) and second input be the current state order( data retrieved for that order from multiple tables of db) . For these two inputs the output would be the desired action to be taken like to update certain columns and fire XML for that order. The tools which I think would required is a Natural Language processor( NLP) library for understanding text and machine learning so as learn from past confusing scenarios.
If you want to use Java libraries for your NLP Pipeline, have a look at Opennlp.
you've a lot of basic support here.
And then you've deeplearning4j where you've a lot of Neural Network implementations in java.
As you want a Dynamic model which can learn from past experiences rather than a static one, you've a number of neural netwrok implementations which you can play with in deeplearning4j.
Hope this helps!

Methods for Cost Analysis in Google Sheets

I have the following spreadsheet:
Cost Analysis Google Sheet
I am trying to think of the best way to analyze the cost impact on a Variant based on the different components of the product. I am having a very tough time trying to be creative and think of ways to identify cost trade-off analysis using Google Sheets.
Basically, I am trying to find methods within Sheets to help me visualize the added value of certain components for a Variant.
I know that this is difficult to do without any domain knowledge of the application, but I am hoping that someone has some more general ideas for how to do some reporting and visualization of data like this!
Thanks so much!
You have already added conditional formating, which I consider great for visual identification of tables, given that I don't think this model would be improved by a graph. First I would recommend changing the conditional formatting to gradual and having the green extreme be the maximum negative value of the Diff.-columns. Second, if you want the simplest visualization possible, you can do a rank-list.
This would work like a dashboard, presenting the variants with the information you want. Here is an example, which takes the PN-column of the row with the lowest Diff-value in the Variant 3-rows: =index(G3:I16,match(vlookup(SMALL(I3:I16,1),I3:I16,1,false),I3:I16),1)
You can then alter the rank and the offset, to get a list for the # best variants with the columns you want.
Hope that helps, for the visualization. For advice on organization, I believe Google Sheets-forum is less appropriate :)

Best db engine for building a web app with ranking algorithms

I've got an idea for a new web app which will involve the following:
1.) lots of raw inputs (text values) that will be stored in a db - some of which contribute as signals to a ranking algorithm
2.) data crunching & analysis - a series of scripts will be written which together form an algorithm that will take said raw inputs from 1.) and then store a series of ranking values for these inputs.
Events 1.) and 2.) are independent of each other. Event 2 will probably happen once or twice a day. Event 1 will happen on an ongoing basis.
I initially dabbled with the idea of writing the whole thing in node.js sitting on top of mongodb as I will curious to try out something new and while I think node.js would be perfect for event 1.) I don't think it will work well for the event 2.) outlined above.
I'd also rather keep everything in one domain rather than mixing node.js with something else for step 2.
Does anyone have any recommendations for what stacks work well for computational type web apps?
Should I stick with PHP or Rails/Mysql (which I already have good experience with)?
Is MongoDB/nosql constrained when it comes to computational analysis?
Thanks for your advice,
Ed
There is no reason why node.js wouldn't work.
You would just write two node applications.
One that takes input stores it in the database and renders output
and the other one crunches numbers in it's own process and is run once or twice per day.
Of course if your doing real number crunching and you need performance you wouldn't do nr 2 in node/ruby/php. You would do it in fortran (or maybe C).

Resources