HIGHCHARTS / GANTT Multiple tasks same lane using hierarchical structure - highcharts

We would like to ask a question, we have several lanes (Y categories), mainly we create many lanes per category, in my example below, the cat LXXXXXX has 2 lanes, one with milestones the other lane with a pill, our question is, can web setup the following type of structure in a hierarchical way? in other words Thank you for all your help.

Related

Hierarchical labels or dense nodes?

(I am new to Neo4J and very excited about it)
Here is my conceptual question:
Suppose we want to represent life on earth (based on a biological taxonomy hierarchy).
However, suppose at the leaves of the taxonomy tree we want to actually identify individual organisms. For example, at the mammalia branch, the homo-sapient sub-branch we want to identify each and every one of 7 billion humans and do the same for some other branches (give an ID to every living known great Ape left in the wild and so on)
Is this type of organization done with dense nodes (in the billions) ? or is it done with extensive use of labels (do labels support nesting)?
From my point of view it's better to use multiple nodes instead of multiple labels.
But it depends on the use case and what you want to do with it.
Neo4j doesn't support nested labels or some labels hierarchy.
Here are some resources which could be interesting for you
Graph Databases in Life Sciences: Bringing Biology Back to Its Nature
Open Tree of Life and Neo4j

how to build an efficient ItemBasedRecommender in Mahout?

I am building an Item Based Recommender System for 10 millions users who
rate categories over 20 possible categories (news categories like politic,
sport etc...)
I would like for each one of them to be recommended at least another
category which they don't know (no rating).
I runned a GenericUserBasedRecommender and asked for recommendations for
each user but It looks extremely long: maybe 1000 user proceeded per minute.
My questions are:
1- Can I run this same GenericUserBasedRecommender on hadoop and would it
really be faster? I saw and run an ItemBasedRecommender with command line on
a cluster, but I would rather run a User Based one.
1,5 - I saw many users not having a single recommendations. What is the alogrithm criterium to determine if a user get a recommendation? I thought It could be that the user who don't get recommendations are the one who only give a single rating, but I don't understand why.
2- Is there another smarter way to deal with my problem? Maybe some clustering
solution instead of recommendation? I don't exactly see how.
3- Finally, am I right when I say that the algorithms who have no command line
are not to be used with hadoop?
Thank you for your answers.
Sometimes you won't get recommendations for certain items or users because there are few items over which they overlap. It could also be a case where the user data may be 'enough', but his behaviour/use patterns are very unique and/or disagreement with popular trends in the data.
You could perhaps try LogLikelihood or Tanimoto based ItemSimilarity.
Another thing you could look into is a Matrix Factorization based model. You could use the ALSWR Factorizer to generate recommendations. this method decomposes the original User-Item matrix, to a User-Feature, Item-Feature and Diagonal matrix,--> then reduces the dimensionality-->and then recronstructs the matrix which is closest to the original matrix with same rank. You might lose some data this method, but the missing values in the user-item matrix are imputed and you get estimate preference/recommendation values.
If you have the features and not just implicit ratings, you could probably experiment with clustering techniques, perhaps start with Hierarchical Clustering.
I did not quite get your last question.

Predicting Football match winners based only on previous data of same match

I'm a huge football(soccer) fan and interested in Machine Learning too. As a project for my ML course I'm trying to build a model that would predict the chance of winning for the home team, given the names of the home and away team.(I query my dataset and accordingly create datapoints based on previous matches between those 2 teams)
I have data for several seasons for all teams however I have the following issues that I would like some advice with.. The EPL(English Premier League) has 20teams which play each other at home and away (380 total games in a season). Thus, each season, any 2 teams play each other only twice.
I have data for the past 10+ years, resulting in 2*10=20 datapoints for the two teams. However I do not want to go past 3 years since I believe teams change quite considerably over time (ManCity, Liverpool) and this would only introduce more error into the system.
So this results in just around 6-8 data points for each pair of team. However, I do have several features(upto 20+) for each data point like Full-time goals, half time goals, passes, shots, yellows, reds, etc. for both teams so I can include features like recent form, recent home form, recent away form etc.
However the idea of just having only 6-8 datapoints to train with seems incorrect to me. Any thoughts on how I could counter this problem?(if this is a problem in the first place i.e.)
Thanks!
EDIT: FWIW, here's a link to my report which I compiled at the completion of my project. https://www.dropbox.com/s/ec4a66ytfkbsncz/report.pdf . It's not 'great' stuff but I think some of the observations I managed to elicit were pretty cool (like how my prediction worked very well for the Bundesliga because Bayern win the league all the time).
That's an interesting problem which I don't think has an unique solution. However, there are a couple of little things that I could try if I were in your position.
I share your concerning about 6-8 points per class being too little data to build a reliable model. So I would try to model the problem a bit differently. In order to have more data for each class, instead of having 20 classes I would have only two (home/away) and I would add two features, one for the team being home and other one for the away team. In that setup, you can still predict which team would win given if it is playing as home or away, and your problem has more data to produce a result.
Another idea would be to take data from other European leagues. Since now teams are a feature and not a class, it shouldn't add too much noise to your model and you could benefit from the additional data (assuming that those features are valid in another leagues)
I have some similar system - a good base for source data is football-data.co.uk.
I have used last N seasons for each league and built a model (believe me, more than 3 years is a must!). Depends on your criterial function - if criterion is best-fit or maximum profit you may build your own predicting model.
One very good thing to know is that each league is different, also bookmaker gives different home win odds on favorite in Belgium than in 5th English League, where you can find really value odds for instance.
Out of that you can compile interesting model, such as betting tips to beat bookmakers on specific matches, using your pattern and to have value bets. Or you can try to chase as much winning tips as you can, but possibly earns less (draws earn a lot of money even though less amount of draws is winning).
Hopefully I gave you some ideas, for more feel free to ask.
Don't know if this is still helpful, but features like Full-time goals, half time goals, passes, shots, yellows, reds, etc. are features that you don't have for the new match that you want to classify.
I would treat this as a classification problem (you want to classify the match in one of 3 categories: 1, X, or 2) and add more features that you can also apply to the new match. i.e: the number of missing players (due to injury/red cards), the number of wins/draws/losses each team has had in a row immediately BEFORE the match, which is the home team (already mentioned), goals scored in the last few matches home and away etc...
Having 6-8 matches is the real problem. This dataset is very small and there would be a lot of over-fitting, but if you use features like the ones I mentioned, I think you could also use older data.

Rails reporting objects

I am currently attempting to implement a reporting module in a rails app. Thanks to some assistance provided here: Ruby on Rails object reporting, I have decided to go down the road of coding common metrics and populating reports with these.
What I have to work out is how to create the metrics - essentially I need to have a metric object that I can use within my targeting framework (e.g. if i have a target object where target.value is 0.5, I can have target.metric_id to know which metric is being targeted, and thus report on it).
My problem is how to store the formula for the metric within the model structure. A simple example of a metric would be profit, where I could do sales.selling_prices.sum - sales.cost_prices.sum. How can I set up some columns that allow this formula to be stored? All formulas will be calculated using other objects, as in the profit example.
Any assistance would be greatly appreciated.
Thanks!
Depending on how ambitious your formulas get, you could start with something like this for metric:
operation_type:string, one of %w(add sub mult div)
left_operand:decimal
right_operand:decimal
Then, to calculate, you might have a method on metric like:
def result
if operation_type == 'add'
left_operand + right_operand
elsif operation_type == 'sub'
left_operand - right_operand
...
end
When you create your metrics (maybe an admin panel of some kind) you could have ways of selecting the source inputs (for instance, left_operand is set to sales.selling_prices.sum, etc).

How to get descriptive statistics on questionnaire items by group using SPSS?

I have carried out an evaluation of a product using likert scale questionnaire and imported the date into SPSS. I have my columns arranged as follows:
ID, Group, Q1, Q2, Q3, Q4
I have two different groups completing the questionnaire, with each person a different numerical ID. Under the Q columns, I have the score given for that person (from 1-5) from the Likert Scale.
In all there are over 300 responses.
I am running analysis using 'descriptive statistics/frequencies' from the menubar and not getting the tables I am looking for. Basically, it is including all respondents together, whereas I would like it to compare the two groups in the tables.
How can I get descriptive statistics on questionnaire items by group using SPSS?
In addition, if you have any further tips as to what analysis I could perform on this type of data in SPSS I'd be most grateful. I'd like to show that there isn't a significant difference in opinions between the groups, and from looking at the data, it appears that this is the case.
One option
split file by group
run descriptive statistics as usual
See this SPSS FAQ item from UCLA on how to analyze data by categories.
The short answer to you question is, crosstabs Q1 to Q4 by group. will produce the table you want. Or if you have the ctables package available a more compact table will be produced by
variable level group_id Q1 to Q4 (nominal).
ctables
/table Q1 + Q2 + Q3 + Q4 by group_id.
Either can be elaborated on to produce other statistics if wanted. It seems to me a chi-square test would be sufficient for your question.
As far as further analysis it is a bit of an open-ended question that needs more focus to be able to effectively answer. I frequently suggest visual exploration for such exploratory analysis, and hence I would suggest perusing this question on the site, Visualizing Likert responses using R or SPSS for potential ideas about how to visualize the responses. Another motivating post may be How to visualize 3D contingency matrix?.
There are a ton of other questions related to analyzing likert responses on this site though, and it is difficult to give any more specific advice without a more specific motivation for the analysis.
While the above answers all have their good points, I usually prefer this procedure (type the following into a syntax window and Run):
means q1 to q4 by group/stat anova.
This will give you group means, sample sizes, and standard deviations as well as tests of the difference in means between the groups, for each of the variables Q1 to Q4. Of course, the tests will only give you valid results to the extent that your data meet the standard assumptions of anova. Some may say that variables measured on an ordinal 1-5 scale are not suitable for anova, and in academic contexts this is often true, but in business contexts most people are willing to sacrifice some rigor for the sake of convenience. It's much more convenient to compare 4x2=8 means than it is to compare the distributions of 4x5x2=40 categories of responses.
This can easily be done by using the "Crosstabs" function in SPSS for Windows:
Analyze --> Descriptive Statistics --> Crosstabs. Move the dependent variable(s) into the "Row(s)" box, then move the grouping variable into the "Column(s)" box, then click OK.

Resources