Rails object.to_yaml without leading dashes - ruby-on-rails

Every to_yaml output have three leading dashes:
---
a:
b:
c: soemthing
How convert object to yaml without leading dashes?

From the YAML specification:
YAML uses three dashes (“---”) to separate directives from document content. This also serves to signal the start of a document if no directives are present.
That means: Your example starts as a valid YAML document should start. Nothing wrong with it.

Under the hood, to_yaml uses Psych to parse and emit your data.
Though there are optional parameters you can specify (listed here), none suppress the leading dashes.
The simplest approach would be to gsub away the dashes.
object.to_yaml.gsub(/^---$/, "")

Related

YAML: encoding vs semantic differences

I would like to get some better understanding about what aspects of YAML refer to the encoding of data vs what aspects refer to semantic.
A simple example:
test1: dGVzdDE=
test2: !!binary |
dGVzdDE=
test3:
- 116
- 101
- 115
- 116
- 49
test4: test1
Which of these values (if any) are equivalent?
I would argue that test1 encodes the literal string value dGVzdDE=. test2 and test3 both encode the same array, just using a different encoding. I am unsure about test4, it contains the same bytes as test2 and test3 but does this make it the equivalent value or is a string in YAML different from a byte array?
Different tools seem to produce different answers:
https://onlineyamltools.com/convert-yaml-to-json suggests that test2 and test3 are equivalent, but different from test4
https://yaml-online-parser.appspot.com/ suggests that test2 and test4 are equivalent, but different from test4
to yq all entries are different yq < test.yml:
{
"test1": "dGVzdDE=",
"test2": "dGVzdDE=\n",
"test3": [
116,
101,
115,
116,
49
],
"test4": "test1"
}
What does the YAML spec intend?
Equality
You're asking for equivalence but that's not a term in the spec and therefore cannot be discussed (at least not without definition). I'll go with discussing equality instead, which is defined by the spec as follows:
Two scalars are equal only when their tags and canonical forms are equal character-by-character. Equality of collections is defined recursively.
One node in your example has the tag !!binary but the others do not have tags. So we must check what the spec says about tags of nodes that don't have explicit tags:
Tags and Schemes
The YAML spec says that every node is to have a tag. Any node that does not have an explicit tag gets a non-specific tag assigned. Nodes are divided into scalars (that get created from textual content) and collections (sequences and mappings). Every non-plain scalar node (i.e. every scalar in quotes or given via | or >) that does not have an explicit tag gets the non-specific tag !, every other node without explicit tag gets the non-specific tag ?.
During loading, the spec defines that non-specific tags are to be resolved to specific tags by means of using a scheme. The specification describes some schemes, but does not require an implementation to support any particular one.
The failsafe scheme, which is designed to be the most basic scheme, will resolve non-specific tags as follows:
on scalars to !!str
on sequences to !!seq
on mappings to !!map
and that's it.
A scheme is allowed to derive a specific tag from a non-specific one by considering the kind of non-specific tag, the node's position in the document, and the node's content. For example, the JSON Scheme will give a scalar true the tag !!bool due to its content.
The spec says that the non-specific tag ! should only be resolved to !!str for scalars, !!seq for sequence, and !!map for mappings, but does not require this. This is what most implementations support and means that if you quote your scalar, you will get a string. This is important so that you can give the scalar "true" quoted to avoid getting a boolean value.
By the way, the spec does not say that every step defined there is to be implemented slavishly as defined in the spec, it is more a logical description. A lot of implementations do not actually transition from non-specific tags to specific tags, but instead directly choose native types for the YAML data they load according to the scheme rules.
Applying Equality
Now that we know how tags are assigned to nodes, let's go over your example:
test1: dGVzdDE=
test2: !!binary |
dGVzdDE=
The two values are immediately not equal because even without the tag, their content differs: Literal block scalars (introduced with |) contain the final linebreak, so the value of test2 is "dGVzdEDE=\n" and therefore not equal to the test1 value. You can introduce the literal scalar with |- instead to chop the final linebreak, which I suppose is your intent. In that case, the scalar content is identical.
Now for the tag: The value of test1 is a plain scalar, hence it has a non-specific tag ?. The question is now: Will this be resolved to !!binary? There could be a scheme that does this, but the spec doesn't define one. But think about it: A scheme that assigns every scalar the tag !!binary if it looks like base64-encoded data would be a very specific one.
As for the other values: The test3 value is a sequence, so obviously not equal to any other value. The test4 value contains content not present anywhere else, therefore also not equal.
But yaml-online-parser does things!
Yes. The YAML spec explicitly states that the target of loading YAML data is native data types. Tags are thought of as generic hints that can be mapped to native data types by a specific implementation. So an !!str for example would be resolved to the target language's string type.
How this mapping to native types is done is implementation-defined (and must be, since the spec cannot cater to every language out there). yaml-online-parser uses PyYAML and what it does is to load the YAML into Python's native data types, and then dump it again. In this process, the !!binary will get loaded into a Python binary string. However, during dumping, this binary string will get interpreted as UTF-8 string and then written as plain scalar. You can argue this is a bug, but it certainly doesn't violate the spec (as the spec doesn't know what a Python binary string is and therefore does not define how it is to be represented).
In any case, this shows that as soon as you transition to native types and back again, everything goes and nothing is certain because native types are outside of the spec. Different implementations will give you different outputs because they are allowed to. !!binary is not a tag defined in the JSON scheme so even translating your input to JSON is not well-defined.
If you want an online tool that shows you canonical YAML representation without loading data into native types and back, you can use the NimYAML testing ground (my work).
Conclusion
The question of whether two YAML inputs are equal is an academic one. Since YAML does allow for different schemes, the question can only be definitely answered in the context of a certain scheme.
However, you will find very few formal scheme definitions outside of the YAML spec. Most applications that do use YAML will document their input structure in a less formal way, and most of the time without discussing YAML tags. This is fine because as discussed before, loading YAML does not need to directly implement the logical process described in the spec.
Your answer for practical purposes should come from the documentation of the application consuming the YAML data. If the documentation is very good, it will answer this, but a lot of YAML-consuming applications just use the default settings of the YAML implementation they use without telling you about this.
So the takeaway is: Know your application and know the YAML implementation it uses.

Is it possible to have a special character in a string that comes from a YAML file?

I'm working on a translation project and I'm moving all of the English strings out of the views and into a YAML file. Some of the very well written strings employ special characters such as ampersands and N-dashes.
Is there any way to include those?
In the meantime I've turned "&" to "and" and "–" to "--"
but, at least in the English version, I feel like the copy starts to loose it's flavor. I doubt the Chinese version will miss these, but maybe they will want different special characters that I don't know about.
You can have special characters in YAML file values as long as they're not at the beginning of a string.
In the case of &, for example, if it is the first character of your string, then your YAML parser will think it's an anchor, when it tries to read the string (if it's not, like key: this & that, then it will be read as a string, as you would expect).
For more information about what you can and can't have in your YAML strings (and what are considered special characters), see:
YAML Ruby Cookbook
The question and accepted answer for Do I need quotes for strings in Yaml?

What do the last lines in Lua's `package.config` mean?

The Lua specs say about package.config (numbering added by me):
The first line is the directory separator string. Default is '\' for Windows and '/' for all other systems.
The second line is the character that separates templates in a path. Default is ';'.
The third line is the string that marks the substitution points in a template. Default is '?'.
The fourth line is a string that, in a path in Windows, is replaced by the executable's directory. Default is '!'.
The fifth line is a mark to ignore all text before it when building the luaopen_ function name. Default is '-'.
My paraphrasing:
Absolutely clear (example for Windows/other systems makes it fool proof)
There can be multiple paths in a path string. They are separated by this symbol (; by default).
Wherever Lua finds this character in the path string (? by default), it will replace it with the module name supplied to the require or package.searchpath functions and check whether that file exists.
So far, so good, but the last two lines aren't entirely clear to me.
Why does it say "in a path in Windows"? Does that mean on other platforms, this doesn't have any significance? If so, why?
It took me a while to make sense of this, but eventually another part of the specs gave me a hint:
The name of this C function is the string "luaopen_" concatenated with a copy of the module name where each dot is replaced by an underscore. Moreover, if the module name has a hyphen, its prefix up to (and including) the first hyphen is removed. For instance, if the module name is a.v1-b.c, the function name will be luaopen_b_c.
So is this symbol (- by default) intended to make different versions of a library available at the same time – potentially with an unprefixed symlink to the newest version so that the same library would be accessible on two paths (i.e. under two module names), but with only one C symbol name?
4: Applications for Linux have libraries installed system-wide; however, for Windows, libraries can be installed in the current directory.
5: Versioning and project forking, I believe, would be the reason behind this.

FitNesse: Can't see the difference between the expected and the actual result in failed assertion

I'm using FitNesse to test web service responses using check to compare the expected to the actual response.
In some cases the check is failing and I can't see what the differences are between the expected and the actual that is causing it to fail.
Here's a screenshot from what it's telling me in a specific instance (of many similar instances):
Feel free to point out the obvious; it's probably staring at me in the face and I'm looking so hard I can't see it!
I would check that the expected and actual strings are both written with the same text encoding. I've seen this error plenty of times when the text comparing failed due to a comma or apostrophe being written in different encodes.
It is possible that your string contains extra spaces in the actual value. FitNesse, being html based, will not respect leading or trailing spaces. It might not handle any extra spaces inside the actual either. So this can cause the result to be different, but not visibly so.
See if you can add some debug messages that would help you see the extra spaces, or at least count the number of characters in both strings.
This question doesn't specify whether Slim or Fit are being used, or which Slim server/plugin if using Slim, but I found the following to be true for me using FitNesse release 20130530 and fitSharp release 2.2:
Non-ASCII characters and { apostrophes / single quote characters } in input arguments/parameters that are strings are HTML encoded. The values in my FitNesse test tables are HTML encoded, but only the required syntax characters and (double) quotes; not the non-ASCII characters (and FitNesse doesn't seem to have any problems storing those values).
EOL characters in the input arguments that are strings consist of a linefeed character only
I imagine that because I'm using .NET, EOLs in my return values consist of carriage return and linefeed characters.
Because of [1], I'm HTML-encoding non-ASCII characters (but not the HTML syntax characters or quotes). Because of [2] and [3], I'm now removing carriage return characters from my fixture return values. Both changes seem to have resolved this issue for me and expected and actual values are now reported as being the same.
Whitespace has troubled me often. The resulting HTML just collapses whitespace, but the compare in code does not.
I now use a fixture to make differences more explicit to me. Example usage: http://fhoeben.github.io/hsac-fitnesse-fixtures/examples-results/HsacExamples.SlimTests.UtilityFixtures.CompareFixtureTest.html
Newer versions of FitNesse (since 20151230) do a diff on the expected and actual result values. Has that helped you at all?

SnakeYAML: How to disable underscore stripping when parsing?

Here's my problem. I have YAML doc that contains the following pair:
run_ID: 2010_03_31_101
When this get's parsed at
org.yaml.snakeyaml.constructor.SafeConstructor.ConstructYamlInt:159
underscores get stripped and Constructor returns Long 20100331101
instead of unmodified String "2010_03_31_101" that I really need.
QUESTION: How
can I disable this behavior and force parser to use String constructor
instead of Long?
OK. Got answer form their mailing list. Here it is
Hi, according to the spec
(http://yaml.org/type/int.html): Any
“_” characters in the number are
ignored, allowing a readable
representation of large values
You have a few ways to solve it. 1) do
not rely on implicit types, use quotes
(single or double) run_ID:
'2010_03_31_101'
2) Turn off resolver for integers (as
it is done here for floats) link
1 link 2
3) Define your own pattern for int
link 3
Please be aware that when you start to
deviate from the spec other recipients
may fail to parse your YAML document.
Using quotes is safe.
Andrey

Resources