DTD XML parsing - xml-parsing

If I have:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE country[
<!ELEMENT country
(president | king | (king,queen) | queen)>
<!ELEMENT president (#PCDATA)>
<!ELEMENT king (#PCDATA)>
<!ELEMENT queen (#PCDATA)>
]>
Why (president | king | (king,queen) | queen)> generate the following error if we try to validate
<country><king>Luis</king></country>
we get the error message [...]Both 1st and 2nd occurence of "king" are possible. What if I write: (president | (king) | (king,queen) | queen)> ?

It's because your content model is non-deterministic. This means that given the king element, the parser cannot determine which model is being matched without looking ahead. See Deterministic Content Models (Non-Normative) for more details.
What I would do is make queen optional when a king is present:
<!ELEMENT country (president | (king,queen?) | queen)>
Response to comment...
The XML processor cannot use "look ahead" in order to figure out what is gonna "happen" after matching "king", right?
Right. For example, lets say we have this country element:
<country>
<king/>
</country>
and we declare country like this in our DTD:
<!ELEMENT country (president | king | (king,queen) | queen)>
there are 4 possible options for the content of country:
one "president"
one "king"
one "king" followed by one "queen"
one "queen"
So if we have a king element in our XML, the parser doesn't know if it is option #2 or option #3.
If we declare country like this:
<!ELEMENT country (president | (king,queen?) | queen)>
there are 3 possible options for the content of country:
one "president"
one "king" followed by zero or one "queen"
one "queen"
As you can see, if we have a king element in our XML there is only one possible option that the parser can choose.

Related

Which Starspace training mode to use for multi-level embeddings

I am using the StarSpace embedding framework for the first time and am unclear on the "modes" that it provides for training and the differences between them.
The options are:
wordspace
sentencespace
articlespace
tagspace
docspace
pagespace
entityrelationspace/graphspace
Let's say I have a dataset that looks like this:
| Author | City | Tweet_ID | Tweet_contents |
|:-------|:-------|:----------|:-----------------------------------|
| A | NYC | 1 | "This is usually a short sentence" |
| A | LONDON | 2 | "Another short sentence" |
| B | PARIS | 3 | "Check out this cool track" |
| B | BERLIN | 4 | "I like turtles" |
| C | PARIS | 5 | "It was a dark and stormy night" |
| ... | ... | ... | ... |
(In reality, my dataset is not a language data and looks nothing like this, but this example demonstrates the point well enough.)
I would like to simultaneously create embeddings from scratch (not using pre-existing embeddings at any point) for each of the following:
Authors
Cities
Tweet/Sentences/Documents (EG. 1, 2, 3, 4, 5, etc.)
Words (EG. 'This', 'is', 'usually', ..., 'stormy', 'night', etc.)
Even after reading the coumentation, it doesn't seem clear which 'mode' of starspace training I should be using.
If anyone could help me understand how to interpret the modes to help select the appropriate one, that would be much appreciated.
I would also like to know if there are conditions under which the embeddings generated using one of the modes above, would in some way be equivalent to the embeddings built using a different mode (ignoring the fact that the embeddings would be different because of the non-determinstic nature of the process.)
Thank you

Gherkin Scenario Outline using the same table multiple times

I'm writing a Scenario Outline in Specflow for Visual Studio. The objective is to test a Person Name comparer feature, in order to choose the best name between the two.
In my case, I have properties that belong to the names and properties external to them, which belong to the Person entity.
The comparison flow is mad in two parts: first I check the properties of the persons (owners of the names) to decide and if that doesn't yield me a result (meaning their properties are the same) then I check the names' properties.
I've written separate tests for the names' properties comparison, so in this test I only care about the Person properties and the relation between the names - which can be: Name1 < Name2, Name1 > Name2 or Name1 ≡ Name2.
By now I have written a scenario outline for each of those three cases, since I need to run each of the parameters in my Examples table once for each of those cases.
The code looks something like this:
Scenario Outline: Comparing names
Given I have a first name <name1>
And the first person has properties <properties1>
And I have a second name <name2>
And the second person has properties <properties2>
When I choose the best name
Then the best name should be <best name>
Examples:
| properties1 | properties2 |
| FirstName:"Carlos" | FirstName:"Johny" |
| LastName:"Smith" | FirstName:"Johny" |
| FirstName:"John",LastName:"Smith" | LastName:"Smith" |
Now in place of the names, I wrote this 3 times, one time for each case of the relation between the names, where I have the names hard-coded on the scenario.
Ideally, I would like to have a table of tables to be able to have a primary parameter that is ran with every line of the table.
Any idea how to implement that without having three different Scenario Outlines?
A SpecFlow table for creating each person might be the ideal solution. This allows you to pass values for each name, or a null value:
Scenario Outline: Comparing names
Given I have a first name <name1>
And the first person has properties:
| Field | Value |
| First Name | <first name 1> |
| Last Name | <last name 1> |
And I have a second name <name2>
And the second person has properties:
| Field | Value |
| First Name | <first name 2> |
| Last Name | <last name 2> |
When I choose the best name
Then the best name should be <best name>
Examples:
| name1 | first name 1 | last name 1 | first name 2 | last name 2| best name |
| X | Carlos | | Johnny | | X |
| X | | Smith | Johnny | | X |
| X | John | Smith | | Smith | X |
The advantage of this approach is you can expand the properties you set on each person.

Need to do complex lookups and referencing of data across differently sized samples in Google Sheets

So, some background: I have a Google Sheets document with several sheets of information.
Two sheets hold data that needs to be checked against one another.
Sheet A is a list of contact data (and many other columns of classifying data that is irrelevant to this process, but makes it difficult to do all of this with a single sheet).
Sheet B is a shorter list of names (all of whom are on Sheet A), a date and a dollar amount for each.
When data is entered into B, we enter either the street address or email they provide us at the time as a way of ensuring that two people with the same name do not get mis-matched. If an email is provided, we try to use that.
I need to create a third sheet (C) for easily displaying the data from B with the contact information of the corresponding names stored in A.
While this may not seem complex, I have been running into a lot of issues making this work.
example:
Sheet A: Contact data
Name | Street Address | City | Prov | Postal | Email |
John Smith | 123 Smith Lane | Smithtown | ON | x0x 0x0 | [blank] |
Jane Doe | [blank] | [blank] | [blank] | [blank] | Doe#doecorp.co |
Tim Philips | 111 Philips Crt | Phillipston | ON | z2z 2z2 | [blank] |
Joe Test | [blank] | [blank] | [blank] | [blank] | Joe#testorg.ca |
Sheet B: Donations
Name | Street Address | Email | Date received | Amount (2018)
John Smith | [blank] | smith#smithorg.org | 20/NOV/2018 | $175 |
Joe Test | [blank] | joe#testorg.ca | 15/OCT/2018 | $200 |
Sheet C: output for mail merging (Filter-sorted by email addresses)
Name | Address | Email | Date received | Amount (2018) |
Joe Test | [blank] | joe#testorg.ca | 15/OCT/2018 | $200 |
John Smith | 23 Smith Lane Smithtown, ON x0x 0x0 | [blank] | 20/NOV/2018 | $175 |
Ideally, the individual mailing address fields from A would be pulled and combined into a single cell in the end result. (we use the others to sort and filter our data
This should put you on the right track, click on the link for the whole spreadsheet.

How to achieve conditional formatting of names between pages?

I have a Google Sheet with one page for team leaders to list their desired team members (first names, last names, emails) by row--each TL fills in one row--, and a second page where team members are listed who have actually registered with my program.
Page 1
+------------------------+------------+---------------+------------+--------------------+
| Team Leader First Name | First Name | Email Address | First Name | Email Address |
+------------------------+------------+---------------+------------+--------------------+
| Danielle | Elizabeth | XXX#tamu.edu | Matthew | XXX#tamu.edu |
| Stoian | William | XXX#tamu.edu | Victoria | XXX#email.tamu.edu |
| Christa | Olivia | XXX#tamu.edu | | |
+------------------------+------------+---------------+------------+--------------------+
Page 2
+--------------------+-------------------------+
| Scholar First Name | Scholar Preferred Email |
+--------------------+-------------------------+
| elizabeth | xxx#gmail.com |
| william | xxx#tamu.edu |
+--------------------+-------------------------+
I want to be able to see at a glance which of the names listed by the TL on pg 1 have not registered and thus don't appear on pg 2.
In the example above, I want Olivia, Matthew, and Victoria's names to appear red because she does not show up on pg2 (which means they still need to register). Everyone else should appear normally.
I tried at first to importrange from pg1 to get a clean list of the team members, then conditional formatting to match against pg2, the idea I had being it shows up red if a name is not found.
Import range from page 2 to page 1 the scholar first name to F12:F14
Conditional Formatting: Apply to range B2:B999(first name list in page 1)
=NOT(OR(ISNUMBER(MATCH(TRIM(B2),$F$12:$F$13,0)),ISBLANK(B2)))
Conditional Formatting2: Apply to range D2:D999(Second First name list)
=NOT(OR(ISNUMBER(MATCH(TRIM(D2),$F$12:$F$13,0)),ISBLANK(D2)))
Note: Instead of importing, You could also reference the second sheet using INDIRECT.

Behave - Common features between applications, avoiding duplication

I have many applications which I want to test, which have a largely overlapping set of features. Here is an oversimplified example of a scenario I might have:
Given <name> is playing a game,
When they shoot at a <color> target
Then they should <event>
Examples:
| name | color | event |
| Alice | red | hit |
| Alice | blue | miss |
| Bob | red | miss |
| Bob | blue | hit |
| Bob | green | hit |
It's a silly example, but suppose really I have a lot of players with different hit/miss conditions, and I want to run just the scenarios for a given name? Say, I only want to run the tests for Alice. There's still advantage to having all the hit/miss tests in a single Scenario Outline (since, after all, they're all closely related).
One approach would be to just duplicate the test for every name and tag them, so something like:
#Alice
Given Alice is playing a game
When she shoots at a <color> target
Then she should <event>
Examples:
| color | event |
| red | hit |
| blue | miss |
This way I can run behave --tags #Alice, But then I'm repeated the same scenario for every user, and that's a lot of duplication. Is there a good way to still compress all the examples into one scenario - but only selectively run some of them? What's the right approach here?
Version 1.2.5 introduced better ways to distinguish scenario outlines. It is now possible to uniquely distinguish them and thus select a unique scenario generated from an outline with --name= at the command line. For instance, suppose the following feature file:
Feature: test
Scenario Outline: test
Given <name> is playing a game,
When they shoot at a <color> target
Then they should <event>
Examples:
| name | color | event |
| Alice | red | hit |
| Alice | blue | miss |
| Bob | red | miss |
| Bob | blue | hit |
| Bob | green | hit |
Let's say I want to run only the test for Bob, red, miss. It is in the first table, 3rd row. So:
behave --name="#1.3"
will select this test. In version 1.2.5 and subsequent versions. A generated scenario gets a name which includes "#<table number>.<row number>" where <table number> is the number of the table (starting from 1) and <row number> is the number of the row.
This won't easily allow you to select all scenarios that pertain to a single user. However, you can achieve it in another way. You can split your examples in two:
Examples: Alice
| name | color | event |
| Alice | red | hit |
| Alice | blue | miss |
Examples: Bob
| name | color | event |
| Bob | red | miss |
| Bob | blue | hit |
| Bob | green | hit |
The table names will appear in the generated scenario names and you could ask behave to run all the tests associated with one table:
behave --name="Alice"
I do not know of a way to access the example name in steps and thus get rid of the first column.
The full set of details is in the release notes for 1.2.5.

Resources