Where is the clang-format rules list? - clang-format

I am just beginning to use clang-format, and what I would like to do it map it's rules to my coding standard and identify discrepancies.
The clang-format documentation does a great job describing rules which have options
However, after playing with it, I realized it does a bunch of stuff that is not optional, and I can't find documentation on it.
For instance, it will add spaces around binary operators, turning a = b+c; into a = b + c; There is no way to turn this off, so it is not listed on the ClangFormatStyleOptions page.
I'm fine with not being able to rules off, but I can't find anywhere that it says it is going to add spaces around binary operators. I don't know if applies to all binary operators or just some. Thinking beyond this one rule, I basically have no idea what it is going to do without running a ton of test cases and finding out.
I can't seem to find a list of ALL the rules it applies. I am sure I am missing it somewhere. Can someone point it out?

Related

reading/parsing common lisp files from lisp without all packages available or loading everything

I'm doing a project which involves parsing the histories of common lisp repos. I need to parse them into list-of-lists or something like that. Ideally, I'd like to preserve as much of the original source file syntax as possible, in some way. For example, in the case of the text #+sbcl <something>, which I think means "If our current lisp is sbcl, read <something>, otherwise skip it", I'd like to get something like (#+ 'sbcl <something>).
I originally wrote a LALR parser in Python, which sort of worked, but it's not ideal for many reasons. I'm having a lot of difficulty getting correct output, and I have tons of special cases to add.
I figured that what I should really do is is use lisp itself, since it already has a lisp parser built in. If I could just read a file into sexps, I could dump it into something (cl-json would do) for further processing down the line.
Unfortunately, when I attempt to read https://github.com/fukamachi/woo/blob/master/src/woo.lisp, I get the error
There is no package with the name WOO.EV.TCP
which is of course coming from line 80 of that file, since that package is defined in src/ev/tcp.lisp, and we haven't read it.
Basically, is it possible to just read the file into sexps without caring whether the packages are defined or if they contain the relevant symbols? If so, how? I've tried looking at the hyperspec reader documentation, but I don't see anything that sounds relevant.
I'm out of practice with actually writing common lisp, but it seems potentially possible to hack around this by handling the undefined package condition by creating a blank package with that name, and handling the no-symbol-of-that-name-in-package condition by just interning a given symbol. I think. I don't know how to actually do this, I don't know if it would work, I don't know how many special cases would be involved. Offhand, the first condition is called no-such-package, but the second one (at least in sbcl) is called simple-error, so I don't even know how to determine whether this particular simple-error is the no-such-symbol-in-that-package error, let alone how to extract the relevant names from the condition, fix it, and restart. I'd really like to hear from a common lisp expert that this is the right thing to do here before I go down the road of trying to do it this way, because it will involve a lot of learning.
It also occurs to me that I could fix this by just sed-ing the file before reading it. E.g. turning woo.ev.tcp:start-listening-socket into, say, woo.ev.tcp===start-listening-socket. I don't particularly like this solution, and it's not clear that I wouldn't run into tons more ugly special cases, but it might work if there's no better answer.
I am almost sure there is no easy portable way to do this for a number of reasons.
(Just limiting things to the non-existent-package problem for now.)
First of all there is no portable access into the bit of the reader which decides that tokens are going to be symbols and then looks for package markers &c: that just happens according to the rules in 2.3. So you can't easily intervene in this.
Secondly there's not portably enough information in any kind of condition the reader might signal to be able to handle them.
There are several possible ways out of this bit of the problem.
If you felt sufficiently heroic you might be able to teach the reader that all of the token-starting characters are in fact things you control and then write a token-reader that somehow deals with the whole package thing by returning some object which isn't a symbol. But to do that you need to deal with numbers, and if you think that's simple, well, it's not.
If you felt less heroic you could write a more primitive token-reader which just doesn't even try to deal with anything except grabbing all the characters needed and returns some kind of object which wraps a string. This would avoid the whole number problem at the cost of losing a lot of intofmration.
If you don't care about portability, find an implementation, understand how its reader does it, and muck around with it. There are more open source or source-available implementations than I can easily count (perhaps I am not very good at counting) so this is a pretty good approach. It's certainly what I'd do.
But this is only the start of the problems. The CL reader is hairy and, in its standard configuration (the configuration which is used for things like compile-file unless people have arranged otherwise) can run completely arbitrary code at read time, including code which modifies the reader itself, some of which may do so in an implementation-dependent way. And people use this: there's a reason Lisp is called the 'programmable programming language' and it's that people program it.
I've decided to solve this using sed (actually Python's re.sub, but who's counting?) because it'll work for my actual use case, and was easy.
For future readers: The various people saying this is impossible in general are probably right. The other questions posted by #Svante look like good easy ways to solve part of the problem. Other parts of the problem might be solved more elegantly by replacing the reader macros for #., #+, #-, etc with ones which just make a list, which sounds less heroic than the suggestions from #tfb, but I don't have time for that shit.

Profile Antlr grammar

I found this question here in which OP asks for a way to profile an ANTLTR grammar.
However the answer is somewhat unsatisfying as it is limited to grammars without actions and - even more important - it is an automated profiling that will (as I see it) use the defaul constructor of the generated lexer/parser to construct it.
I need to profile a grammar, that does contain actions and that has to be constructed using a custom constructor. Therefore I'd need to be able to instantiate the lexer + parser myself and then profile it.
I was unable to find any information on this topic. I know there is a profiler for IntelliJ but it works quite similar to the one described in the linked question's answer (maybe it's even the same).
Does anyone know how I can profile my grammar with this special needs? I don't need any fancy GUI. I'd be satisified if I get the result printed to the console or something like that.
To wrap it up: I'm searching for either a tool or a hint on how to write some code that lets me profile my ANTLR grammar (with self-instantiated lexer/parser).
Btw my target language is Java so I guess the profiler has to be in Java as well.
A good start is setting Parser.setProfile() to true and examine what you get from Parser.getParseInfo() after a parse run. I haven't yet looked closer what's provided in detail by the profiling result, but it's on the todo list for my vscode extension for ANTLR4 to provide profiling info for grammars to help improving them.
A hint for getting from the decision info to a specific rule: there's a decision number, which is an index into ATN.decisionToState. The DecisionState instance you can get by this is an ATNState descendant, which allows to get a ATNState.ruleIndex from it. The rule index then can be used with your parser's ruleNames property to find the name of that rule. The value is also what is used for the rule's enum entry.

How to use Agda's auto proof search effectively?

When writing proofs I noticed that Agda's auto proof search frequently wouldn't find solutions that seem obvious to me. Unfortunately coming up with a small example, that illustrates the problem seems to be hard, so I try to describe the most common patterns instead.
I forgot to add -m to the hole to make Agda look at the module scope. Can I make that flag the default? What downsides would that have?
Often the current hole can be filled by a parameter of the function I am about to implement. Even when adding -m, Agda will not consider function parameters or symbols introduced in let or where clauses though. Is there something wrong with simply trying all of them?
When viewing a goal, symbols introduced in let or where clauses are not even displayed. Why?
What other habits can make using auto more effective?
Agda's auto proof search is hardwired into the compiler. That makes it fast,
but limits the amount of customization you can do. One alternative approach
would be to implement a similar proof search procedure using Agda's
reflection mechanism. With the recent beefed up version of reflection using
the TC monad,
you no longer need to implement your own unification procedure.
Carlos
Tome's been working on reimplementing these ideas (check out his code
https://github.com/carlostome/AutoInAgda ). He's been working on several
versions that try to use information from the context, print debugging info,
etc. Hope this helps!

Gendarme Rules Customisation

Does anyone know the correct way to explicitly specify which rules Gendarme will use? Or which rules to exclude? I'm not having a lot of joy searching the Mono documentation for the answer.
What I'm trying to do is to specify the rules one by one in the Gendarme rules.xml file like this:
<rules include="AvoidAssemblyVersionMismatchRule" from="Gendarme.Rules.BadPractice.dll"/>
Doing this, I'm hoping we can then switch off the rules we don't care about. The problem is, after specifying all the rules in this way, I'm getting a different number of defects detected compared with when I use the default method Gendarme provides, which is of the form:
<rules include="*" from="Gendarme.Rules.BadPractice.dll"/>
<rules include="*" from="OTHER DLL NAMES"/>
Has anyone done this before? Or can anyone point me in the direction of some Gendarme rules usage documentation?
To answer my own question:
Specifying the rules explicitly as I outlined above is the correct way to customise the rules list, the reason I was getting a different number of results back was because the "default" rule set in Gendarme leaves out scanning for Code Smells, once I added this scan to the default list, the defect totals matched.

Parsing commands from user input

This question is intended to be a discussion of people's personal opinions in handling user input.
This portion of the project that I am working on handles user input in a manner similar to an IRC chat. For instance, there are set commands and whatnot, for chatting, executing actions, etc.
Now, I have several options to choose from for parsing this input. I could go with regular expressions, I could parse it directly (ie a large switch statement with all supported commands, simply checking the first x number of characters in the user input), or could even go crazy and add in a parser similar to Flex/Bison implementations. One other option I was considering was defining all commands in an XML file to separate them from the code implementation.
So, what are the thoughts of the community?
I'd go with a nice mixed bag of all.
Obviously you'll have to sanitize the input. Make sure there's no nasty stuff there, depending on where the input is going to prevent SQL injection, XSS, CSRF etc...
But when your input is clean, you could go with a regexp that catches the ones intended as command and gets all necessary submatches (command parameters etc.) and then have some sort of dispatcher-switch statement or similar.
There really is no cover-all best practice here, apart from always always and quadruple-always making sure user input is sanitized. Apart from that, go with what seems to fit best for your case.
Obviously there are those that say if you've got a problem and you're thinking of using reg exps to solve said problem, you've got two problems, but used cautiously, they're the best thing ever. Just remember that regexp-monsters can read to really poor readability really quick.

Resources