Machine parseable error messages - bazel

(From https://groups.google.com/d/msg/bazel-discuss/cIBIP-Oyzzw/caesbhdEAAAJ)
What is the recommended way for rules to export information about failures such that downstream tools can include them in UIs.
Example use case:
I ran bazel test //my:target, and one of the actions for //my:target fails because there is an unknown variable "usrname" in my/target.foo at line 7 column 10. It would also like to report that "username" is a valid variable and this is a possible misspelling. And thus wants to suggest an addition of an "e" character.
One way I have thought to do this is to have a separate file that my action produces //my:target.errors that is in a separate output group and have it write machine parseable data there in addition to human readable data on stdout.
I can then find all of these files and parse the data in them in downstream tools.
Is there any prior work on this, or does everything just try to parse the human readable output?

I recommend running the error checkers as extra actions.
I don't think Bazel currently has hooks for custom error handlers like you describe. Please consider opening a feature request: https://github.com/bazelbuild/bazel/issues/new

Related

how to show all gtest case by bazel without "test" cmd

I want to query all gtest cases by bazel,
parameter "--gtest_filter" only can be used with "bazel test " cmd
and I am try to use "bazel query bazel query //xxx:all", but it will show the test list which defined in BUILD file , I want to get the cases list from xxx.cc files.
This is not a job that bazel query can do. Query operates on the graph structure of targets. A fundamental design decision of Bazel is that this graph can be computed by looking only at BUILD files and the .bzl files they depend on. In particular, parsing source files is not allowed.
(The argument to --test_filter is simply passed through the test runner; Bazel does not know what it represents.)
If you use CLion with the Bazel plugin you get the following view for googletest tests:
This works even with Catch2 (but for Catch2 the view is not so nice). I guess that's some IDE magic here - nevertheless, it gives you what you want. I assume you can also come up with some type of Bazel Aspect that produces this information for you.
I tested this also with Lavender (with minor modifications) and Visual Studio which gives me in the test overview also a list of all test:

XTSE1650: net.sf.saxon.trans.LicenseException: Requested feature (xsl:import-schema) requires Saxon-EE

I use java and saxonee-9.5.1.6.jar included build path , when run, getting these errors at different times.
Error at xsl:import-schema on line 6 column 169 of stylesheet.xslt:
XTSE1650: net.sf.saxon.trans.LicenseException: Requested feature (xsl:import-schema)
requires Saxon-EE
Error on line 1 column 1
SXXP0003: Error reported by XML parser: Content is not allowed in prolog.
javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet. 1 error detected.
I open .xslt file in hex editor and dont see any different character at the beginning AND
I use transformerfactory in a different project but any error I get.
Check what the implementation class of tFactory is. My guess is it is probably net.sf.saxon.TransformerFactoryImpl - which is basically the Saxon-HE version.
When you use JAXP like this, you're very exposed to configuration problems, because it loads whatever it finds sitting around on the classpath, or is affected by system property settings which could be set in parts of the application you know nothing about.
If your application depends on particular features, it's best to load a specific TransformerFactory, e.g. tFactory = new com.saxonica.config.EnterpriseTransformerFactory().
I don't know whether your stylesheet expects the source document to be validated against the schema, but it it does, note that this isn't automatic: you can set properties on the factory to make it happen.
I would recommend using Saxon's s9api interface rather than JAXP for this kind of thing. The JAXP interface was designed for XSLT 1.0, and it's a real stretch to use it for some of the new 2.0 features like schema-awareness: it can be done, but you keep running into limitations.

Running spss syntax file on multiple data files automatically

I have a spss syntax file that I need to run on multiple files each in a different directory with the same name as the file, and I am trying to too do this automatically. So far I have tried doing it with syntax code and am trying to avoid doing python is spss, but all I have been able to get is the code bellow which does not work.
VECTOR v = key.
LOOP #i = 1 to 41.
GET
FILE=CONCAT('C:\Users\myDir\otherDir\anotherDir\output\',v(#i),'\',v(#i),'.sav').
DATASET NAME Data#i WINDOW=FRONT.
*Do stuff to the opened file
END LOOP.
EXE.
key is the only column in a file that contains all the names of the files.
I am having trouble debugging since I don't know how to print to the screen if it is possible. So my question is: is there a way to get the code above to work, or another option that accomplishes the same thing?
You can't use an expression like that on a GET command. There are two choices. Use the macro language to put this together (see DEFINE in the Command Syntax Reference via the Help menu) or use the SPSSINC PROCESS FILES extension command or your own Python code to select the files with a wildcard.
The extension command or a Python program require the free Python Essentials available from the SPSS Community website or available with your Statistics version.

Deedle - what's the schema format for readCsv

I was using Deedle in F# to read a txt file (no header) to data frame, and cannot find any example about how to specify the schema.
let df= Frame.ReadCsv(datafile, separators="\t", hasHeaders=false, schema=schema)
I tried to give a string with names separated by ',', but seems don't work.
let schema = #"name, age, address";
I did some search on the doc, but only find following - don't know where I can find the info. :(
schema - A string that specifies CSV schema. See the documentation
for information about the schema format.
The schema format is the same as in the CSV type provider in F# Data.
The only problem (quite important!) is that the Deedle library had a bug where it completely ignores the schema parameter, so no matter what you provide, it would be ignored.
I just submitted a pull request that fixes the bug and also includes some examples (in the form of unit tests). See the pull request here (and click on "Files changed" to see the samples).
If you do not want to wait for a new release, just get the code from my GitHub fork and build it using build.cmd in the root (run this for the first time to restore packages). The complete build requires local installation of R (because it builds R plugin too), but it should build Deedle.dll and then fail... (After the first run of build.cmd, you can just use Deedle.sln solution).

Options for MeCab Japanese tokenizer on iOS?

I'm using the iPhone library for MeCab found at https://github.com/FLCLjp/iPhone-libmecab . I'm having some trouble getting it to tokenize all possible words. Specifically, I cannot tokenize "吉本興業" into two pieces "吉本" and "興業". Are there any options that I could use to fix this? The iPhone library does not expose anything, but it uses C++ underneath the objective-c wrapper. I assume there must be some sort of setting I could change to give more fine-grained control, but I have no idea where to start.
By the way, if anyone wants to tag this 'mecab' that would probably be appropriate. I'm not allowed to create new tags yet.
UPDATE: The iOS library is calling mecab_sparse_tonode2() defined in libmecab.cpp. If anyone could point me to some English documentation on that file it might be enough.
There is nothing iOS-specific in this. The dictionary you are using with mecab (probably ipadic) contains an entry for the company name 吉本興業. Although both parts of the name are listed as separate nouns as well, mecab has a strong preference to tag the compound name as one word.
Mecab lacks a feature that allows the user to choose whether or not compounds should be split into parts. Note that such a feature is generally hard to implement because not everyone agrees on which compounds can be split and which ones can't. E.g. is 容疑者 a compound made up of 容疑 and 者? From a purely morphological point of view perhaps yes, but for most practical applications probably no.
If you have a list of compounds you'd like to get segmented, a quick fix is to create a user dictionary for the parts they consist of, and make mecab use this in addition to the main dictionary.
There is Japanese documentation on how to do this here. For your particular example, it would involve the steps below.
Make a user dictionary with two entries, one for 吉本 and one for 興業:
吉本,,,100,名詞,固有名詞,人名,名,*,*,よしもと,ヨシモト,ヨシモト
興業,,,100,名詞,一般,*,*,*,*,こうぎょう,コウギョウ,コウギョウ
I suspect that both entries exist in the default dictionary already, but by adding them to a user dictionary and specifying a relatively low specificness indicator (I've used 100 for both -- the lower, the more likely to be split), you can get mecab to tend to prefer the parts over the whole.
Compile the user dictionary:
$> $MECAB/libexec/mecab/mecab-dict-index -d /usr/lib64/mecab/dic/ipadic -u mydic.dic -f utf-8 -t utf-8 ./mydic
You may have to adjust the command. The above assumes:
Mecab was installed from source in $MECAB. If you use mecab installed by a package manager, you might have difficulties finding the mecab-dict-index tool. Best install from source.
The default dictionary is in /usr/lib64/mecab/dict/ipadic. This is not part of the mecab package; it comes as a separate package (e.g. this) and you may have difficulties finding this, too.
mydic is the name of the user dictionary created in step 1. mydic.dic is the name of the compiled dictionary you'll get as output (needs not exist).
Both the system dictionary (-t option) and the user dictionary (-f option) are encoded in UTF-8. This may be wrong, in which case you'll get an error message later when you use mecab.
Modify the mecab configuration. In a system-wide installation, this is a file named /usr/lib64/mecab/dic/ipadic/dicrc or similar. In your case it may be located somewhere else. Add the following line to the end of the configuration file:
userdic = home/myhome/mydic.dic
Make sure the absolute path to the dictionary compiled above is correct.
If you then run mecab against your input, it will split the compound into its parts (I tested it, using mecab 0.994 on a Linux system).
A more thorough fix would be to get the source of the default dictionary and manually remove all compoun nouns you want to get split, then recompile the dictionary. As a general remark, using a CJK tokenizer for a serious application in production mode over a longer period of time usually involves a certain amount of dictionary maintenance (adding/removing entries) regularly.

Resources