Struggling with a model because of a lot of data

Struggling with a model because of a lot of data - ruby-on-rails

I'm building an app to write wine tasting notes, and I have to translate this tasting framework (only the first page) into a model.
It's a lot of a data and I'm not sure about how to proceed. I tried to sketch a possible solution in this spreadsheet.
What would you suggest to do? Should I create only one model (Wine) with a column for each wine characteristic?
Thanks!
P.S. I'm learning web development, sorry if my question sounds trivial.

Perhaps not a direct answer to your question but here's a few notes nevertheless:
I'd say for something like wine tasting, I would go for a combination of selects, tags and simple free-text.
This RailsCast should give you a good introduction to the acts-as-taggable-on gem for tagging.
One thing that may be handy when modelling your DB is to look at several wine tasting 'forms' that are already filled out and see if you can see a pattern. Say for Palate/Acidity I would expect the value to be one of light/medium/high, whereas for the Conclusion/Identity it could be pretty much anything.
You'll also need to find the right balance between restricting and allowing the user input. I'd expect your users to be happier with free-text and feeling more restricted with select/radio boxes. On the other hand, it is always easier (for you) to change from a select box to free text, rather than the other way around. Not to mention searching is much easier by selects or tags.
I don't think your question is that trivial and I think you should to simply try it and see. And design with the ability to change in mind.

Related

How should BDD statements be properly constructed? Is there a convention used in teams?

Is there a preferred way of creating BDD scenarios in small agile teams and amongst the community? I'm using courgette and it gives an example on https://courgette-testing.com/bdd
Scenario: Refunded items should be returned to stock
Given a customer previously bought a black sweater from me
And I have three black sweaters in stock.
When they return the black sweater for a refund
Then I should have four black sweaters in stock.
Does this sound like a good idea? Would this work well for communication in teams?
I've used their web steps bit, and am now doing the refactor bit to make it clear to the business.
Any links would help. Thanks

The conversations in BDD are more important than the tools. Rather than starting with the finely-grained specification in Courgette's example, try talking to the business first. Ask them for an example of the kind of behaviour they want.
When you write it down, start by just writing it the way they describe it. It's amazing how few people listen properly! After you've got the example from them, take a look at it. Can you see which bits are the contexts (Givens) and which are the outcomes (Thens)? Which is the step that's associated with triggering the behaviour you're interested in (Whens)?
Once you've worked that out, there are a couple more questions I like to ask:
Is there any other context which, for this same event, gives a different outcome?
Is there any other outcome that's important?
For instance, if I was implementing this behaviour for a big supermarket, I might come across an example like:
"Oh! No, don't add food back to stock. We don't know how it's been stored. We refund it if there's something wrong with it, but we bin it."
You can probably see how that might change your code!
Testers are really great at asking these questions and spotting missing scenarios! This leads us to the "Three Amigos" pattern. I like to include:
A business person, Product Owner, subject matter expert or person with the problem
A tester
A dev (or a pair of devs).
You can also include UI designers, technical writers, etc. - Matt Wynne says it's "Three Amigos where three is a number between 3 and 7".
I really like it when the developer writes the scenarios down, in any form that allows them to get to the "Given, When, Then". Sometimes I'll do it in the meeting; sometimes I do it later and show it or send it to my business person.
Courgette's example is something that typically happens when people don't have these conversations. If you start with the conversations, you're much more likely to get something that matches the above. Not only are those declarative steps easier for business to read and for the whole team to talk about, but they're also easier to maintain, as the detail of how they're achieved is hidden (usually in Step Definitions, and further in Page Objects).
There's all kinds of useful posts for BDD newcomers on my blog if you want to know more!

How to think about models and relationships in more complex Rails applications?

I've been learning Ruby on Rails for quite some time now and have built several toy applications. I've taken many classes/courses (i.e., Hartl, Code School, Udemy, etc.). Now I'm working on a pet project that is fairly complex - many models and relationships.
Here's my question: How do I go about thinking about a complex application in terms of models and relationships? It seems to me there should be some visual way to model all of this, but I haven't seen any discussion of such modeling in any of the classes I've taken. Sure, there's the very simple diagrams in the Rails Guides, but how do I extend this to something more complex? Or, am I making this too complicated? Do I just start coding models and relationships and see where it takes me? It seems this ad hoc approach could easily paint me into a corner where I'll have to start over from scratch if I paint myself into a corner, so to speak.
Are there tools or blog posts that can help me?
(Note: I've also posted this question at Reddit.Com/r/Rails at https://www.reddit.com/r/rails/comments/7c9zbf/how_to_think_about_rails_web_application/)

It sounds like what you're after is a schema designer. I'll link one at the bottom of this post. A schema designer will allow you to visualize all the relationships in your application, and see how one model connects to another. They're really helpful in writing complex DB queries. I've attached an example of a fairly simple design just to give you an idea what they do.You can also add all the columns a model has in the design, I just usually use it more for relationships.
http://ondras.zarovi.cz/sql/demo/

Purely based on my experience/opinion...
You'll end up changing your database design a couple of times as you build your app.
I would start with a whiteboard, then consider using a schema designer as #aram mentioned.
After you see the big picture, just start with the relationships that what you need for today's features. You can keep referring to the original design, so you can see the bigger picture, but you don't want to overbloat your architecture before you need to because it will change.
After you write some code and build those relationships, you can spot check yourself using the rails-erd gem to programmatically generate your app's schema.

How precise user stories should be?

I've just started using SpecFlow. It's a tool for creating business understandable test scenarios in a BDD manner. Basically it transforms user stories to unit tests.
I'm a beginner to user stories and I wonder about its length. Is this a good practice to create very precise user stories? Here's an example:
In order to get help
As a StackOverflow user
I want to add post
with name and content
and add tags to it
and format the content
and the information about my post edits to be stored in the system
and some more things like that
Should I keep my stories compact? If so - how can I manage detailed requirements? Or maybe it's nothing wrong in very long and precise I want section in a user story?

If you could develop an entire system in a couple of weeks, and do that reliably, nobody would ever worry about "user stories". They'd just get you to develop the system, sit with you, and tweak it as it went.
User stories only exist in order to get feedback from people who can't be with you all the time, and to help you learn what it is that your users (and other stakeholders) really want.
Here's how I treat a list like this:
In order to get help
As a StackOverflow user
I want to add post
with name and content
and add tags to it
and format the content
and the information about my post edits to be stored in the system
You want to get help. Which of these actually add to your ability to get help? Is it you wanting help, or do you want to offer help to other people? Do you want recognition for the help you're offering other people? The top part of this seems false (and it's why it's really difficult to have these conversations with fake requirements).
I think there are multiple requirements here, and far beyond the scope of just one user story. With an analyst hat on, here's how I might break this down:
In order to award great content with appropriate recognition,
as Stack Exchange,
we want people's usernames to appear with their content.
Of course, the users want this too, but they're not paying for it (except through adverts). So work out who's paying for this, and why.
In order to get more page impressions and keep people on the site for longer,
as Stack Exchange,
we want users to be able to find similar content really easily.
Hm. This one's a bit trickier. See, the user doesn't really want to spend their entire life on StackOverflow. It's just that if we give them the appropriate recognition, and make it easier for others to find their content, they might do that. Not all "user stories" actually benefit users. Find out who's paying for them, and why; then you find your real stakeholder. It's also OK for a story to benefit more than one stakeholder, and it's easy to see how to rephrase this from the user's point of view as well.
format the content
Honestly not sure about this one. It might be about being able to emphasise important points, etc. There are a ton of aesthetic ideals that don't lend themselves well to BDD and automated scenarios. Sometimes the only way to do this is to try, and get feedback.
In order to avoid retyping my request every time
As the user
I want the information about my post edits to be stored in the system
Well, yes, that would be nice.
The thing is that each of these can be developed independently. If you can think of any feature, any item that you could get rid of and still have the release be valuable, put it in a separate story.
If you can replace "I want to..." with "I want to be able to..." it's likely that what you have there isn't a story, but an entire capability. Most people do this instinctively. Lots of people call those "epics".
I've just shown you how I break them down. It's a pretty simple process.
First, look at your requirements. If there's anything for which you can say, "I want to be able to..." or "Someone wants to be able to..." then you know that's a completely different capability, which means it's going to be a separate story.
You can then separate those into contexts. So you might have stories like:
In order to free up our junior traders
We want them to be able to generate contracts automatically
So that they can help with the trade analysis instead of typing.
If that seems too big for the feedback cycle (typically a two-week sprint), you can divide it further.
In order to free up our junior traders
We want them to be able to generate *orange juice* contracts automatically
So that they can help with the trade analysis instead of typing.
Here, we're focusing on being able to trade orange juice, but we could equally narrow the story down to the FTSE, or the US, or the NY stock exchange. This is how we focus the efforts on the thing that will deliver: protecting revenue, lowering costs or generating value.
To turn these into scenarios, I ask, "Can you give me an example of an OJ trade on the NY stock exchange?" If I see anything generic that I don't understand, I ask, "Can you give me an example of that?"
That example becomes my first scenario. The context (given) is defined by the limits of the story. The event (when) is the performance of the capability. The outcome (then) is the resulting value.
In answer to your question - yes, I think it's important to create precise user stories. That means knowing why it's valuable, defining the context that you're going to cover, and suggesting an example of what the outcome might be.
The example you gave is more than just one story, though. It's not precise enough. Hopefully the advice here will help you to narrow stories down to something useful. One or two days is a good length for a story, but if you're starting down this path and find they're a bit longer, that's OK.
Your changes are also stories.

I always advise the following:
Try cutting your stories in scenarios. The more scenarios, the better you can pinpoint when something is going wrong. Give all scenarios subjective names.
Now for example, your test. If step 1 goes wrong, all your other steps are not going to get tested.
Also use the Given, When and Then tags to read your scenarios easily.
So instead, you could say:
Feature: As a StackOverflow user I want to add a post
Scenario: I go to stackoverflow website
Given I open the browser
And I go to the stackoverflow website
When I click New Post
Then a new page appears to insert my data
Scenario: I fill in data for my post - Name and content
Given I do not modify this page
When I fill in name
And I fill in content
Then I add tags to it
And I format the content
Scenario: Check if information about post edits are stored in the system
Given...
Guess you will get where this is going :-)

There is no right detail level of user stories, as user stories shrink in size (scope) and grow in detail (specifications) over time. This slide shows a nice visualization from Gojko Adzic about this: http://www.slideshare.net/chassa/2015-0214agile-reqend2endcomplete/6
For the question on how precise and detailed a Gherkin scenario should be: Scenario should reveal interesting aspects of the user story to be implemented. They should use concrete (key) examples rather than abstract descriptions. The examples should focus on the aspect that should be illustrated. The scenario title should be an abstract description of the rule or aspect that is illustrated with the example(s) provided in the scenario.
You usually start with a main aspect (happy path) scenario, and then try to “break the model” by coming up with new examples (cases) that explore other aspects of the story. You start by asking the questions “How would you try out the story when it was implemented?” (happy path) and “What should happen if …?” to collect potential scenarios to consider (probably defining some of the questions to be out of scope for this story).
After that, you’re trying to answer these questions (scenario title) and illustrate them with concrete examples (scenario steps). This slide gives an idea of “break the model”: http://www.slideshare.net/chassa/2015-0214agile-reqend2endcomplete/61

Intelligently extracting tags from blogs and other web pages

I'm not talking about HTML tags, but tags used to describe blog posts, or youtube videos or questions on this site.
If I was crawling just a single website, I'd just use an xpath to extract the tag out, or even a regex if it's simple. But I'd like to be able to throw any web page at my extract_tags() function and get the tags listed.
I can imagine using some simple heuristics, like finding all HTML elements with id or class of 'tag', etc. However, this is pretty brittle and will probably fail for a huge number of web pages. What approach do you guys recommend for this problem?
Also, I'm aware of Zemanta and Open Calais, which both have ways to guess the tags for a piece of text, but that's not really the same as extracting tags real humans have already chosen. But I would still love to hear about any other services/APIs to guess the tags in a document.
EDIT: Just to be clear, a solution that already works for this would be great. But I'm guessing there's no open-source software that already does this, so I really just want to hear from people about possible approaches that could work for most cases. It need not be perfect.
EDIT2: For people suggesting a general solution that usually works is impossible, and that I must write custom scrapers for each website/engine, consider the arc90 readability tool. This tool is able to extract the article text for any given article on the web with surprising accuracy, using some sort of heuristic algorithm I believe. I have yet to dig into their approach, but it fits into a bookmarklet and does not seem too involved. I understand that extracting an article is probably simpler than extracting tags, but it should serve as an example of what's possible.

Systems like the arc90 example you give work by looking at things like the tag/text ratios and other heuristics. There is sufficent difference between the text content of the pages and the surrounding ads/menus etc. Other examples include tools that scrape emails or addresses. Here there are patterns that can be detected, locations that can be recognized. In the case of tags though you don't have much to help you uniqely distinguish a tag from normal text, its just a word or phrase like any other piece of text. A list of tags in a sidebar is very hard to distinguish from a navigation menu.
Some blogs like tumblr do have tags whose urls have the word "tagged" in them that you could use. Wordpress similarly has ".../tag/..." type urls for tags. Solutions like this would work for a large number of blogs independent of their individual page layout but they won't work everywhere.

If the sources expose their data as a feed (RSS/Atom) then you may be able to get the tags (or labels/categories/topics etc.) from this structured data.
Another option is to parse each web page and look for for tags formatted according to the rel=tag microformat.

Damn, was just going to suggest Open Calais. There's going to be no "great" way to do this. If you have some target platforms in mind, you could sniff for Wordpress, then see their link structure, and again for Flickr...

I think your only option is to write custom scripts for each site. To make things easier though you could look at AlchemyApi. They have simlar entity extraction capabilities as OpenCalais but they also have a "Structured Content Scraping" product which makes it a lot easier than writing xpaths by using simple visual constraints to identify pieces of a web page.

This is impossible because there isn't a well know, followed specification. Even different versions of the same engine could create different outputs - hey, using Wordpress a user can create his own markup.
If you're really interested in doing something like this, you should know it's going to be a real time consuming and ongoing project: you're going to create a lib that detects which "engine" is being used in a page, and parse it. If you can't detect a page for some reason, you create new rules to parse and move on.
I know this isn't the answer you're looking for, but I really can't see another option. I'm into Python, so I would use Scrapy for this since it's a complete framework for scraping: it's complete, well documented and really extensible.

Try making a Yahoo Pipe and running the source pages through the Term Extractor module. It may or may not give great results, but it's worth a try. Note - enable the V2 engine.

Looking at arc90 it seems they are also asking publishers to use semantically meaningful mark-up [see https://www.readability.com/publishers/guidelines/#view-exampleGuidelines] so they can parse it rather easily, but presumably they must either have developed a generic rules such as #dunelmtech suggested tag/text ratios, which can work with article detection, or they might be using with a combination of some text-segmentation algorithms (from Natural Language Processing field) such as TextTiler and C99 which could be quite usefull for article detection - see http://morphadorner.northwestern.edu/morphadorner/textsegmenter/ and google for more info on both [published in academic literature - google scholar].
It seems that, however, to detect "tags" as you required is a difficult problem (for already mentioned reasons in comments above). One approach I would try out would be to use one of the text-segmentation (C99 or TextTiler) algorithms to detect article start/end and then look for DIV's / SPAN's / ULs with CLASS & ID attributes containing ..tag.. in them, since in terms of page-layout's tags tend to be generally underneath the article and just above the comment feed this might work surprisingly well.
Anyway, would be interesting to see whether you got somewhere with the tag detection.
Martin
EDIT: I just found something that might really be helpfull. The algorithm is called VIPS [see: http://www.zjucadcg.cn/dengcai/VIPS/VIPS.html] and stands for Vision Based Page Segmentation. It is based on the idea that page content can be visually split into sections. Compared with DOM based methods, the segments obtained by VIPS are much more semantically aggregated. Noisy information, such as navigation, advertisement, and decoration can be easily removed because they are often placed in certain positions of a page. This could help you detect the tag block quite accurately!

there is a term extractor module in Drupal. (http://drupal.org/project/extractor) but it's only for Drupal 6.

Intelligent text parsing and translation

What would be an intelligent way to store text, so that it can be intelligently parsed and translated later on.
For example, The employee is outstanding as he can identify his own strengths and weaknesses and is comfortable with himself.
The above could be the generic text which is shown to the user prior to evaluation. If the user is a Male (say Shaun) or female (say Mary), the above text should be translated as follows.
Mary is outstanding as she can identify her own strengths and weaknesses and is comfortable with herself.
Shaun is outstanding as he can identify his own strengths and weaknesses and is comfortable with himself.
How do we store the evaluation criteria in the first place with appropriate place or token holders. (In the above case employee should be translated to employee name and based on his gender the words he or she, himself or herself needs to be translated)
Is there a mechanism to automatically translate the text with the above information.

The basic idea of doing something like this is called Mail Merge.
This page seems to discus how to implement something like this in Ruby.
[Edit]
A google search gave me this - http://freemarker.org/
I don't know much about this library, but it looks like what you need.

This is a very broad question in the field of Natural Language Processing. There are numerous ways to go around it, the questions you asked seem too broad.
If I understand correctly part of your question this could be done this way :
#variable{name} is outstanding as #gender{he/she} can identify #gender{his/hers} own strengths and weaknesses and is comfortable with #gender{himself/herself}.
Or:
#name is outstanding as #he can identify #his own strengths and weaknesses and is comfortable with #himself.
... if gender is the major problem.

I have had some experience working with a tool called Grammatica, when building a custom user input excel like formula parsing and evaluation engine. It may not be to the level of sophistication you're looking for but it's a start. This basically uses many of the same concepts that popular code compiler parsers employ. It's definitely worth checking out.

I agree with Kornel, this question is too broad. What you seem to be talking about is semantics for which RDF's and OWL can be a good starting point. Read about modeling semantics using markup and you can work your way up from there.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart