Does OpenMP hints bypass the vectorisation legality check in llvm

Does OpenMP hints bypass the vectorisation legality check in llvm - clang

I am currently looking into how "#pragma omp for simd" is actually recognised in llvm. To my knowledge, clang will parse it and set metadata in IR to indicate this force-vectorisation hint and later optimisation passes would read it and vectorise the marked loop. Therefore, the loop should be vectorised even the compiler think it might not be safe to do so?
So my assumption is that such force-vectorisation hints should bypass both the vectorisation legality and cost model check. However, in LoopVectorize.cpp, I can't see how this is done. All loops will be sent to a legality check of LVL.canVectorize() and, if this condition does not fit, it return to false directly without actually reaching the vectorisation stage.
Is there anything wrong with my assumption made on the use of force-vectorisation hints?
Thanks in advance,
T

While llvm has publicized a goal of going beyond gcc in implementation of omp simd, I haven't seen much of that myself. I don't see any updates of their high level docs in last 2 years. While it might be loosely described as "vectorize if you know how," I don't think anyone considers it as overriding proved dependencies. I wouldn't be surprised if it acts like auto vectorization with cost model suspended and possible but unproven dependencies ignored.

Related

Which of the three mutually exclusive Clang sanitizers should I default to?

Clang has a number of sanitizers that enable runtime checks for questionable behavior. Unfortunately, they can't all be enabled at once.
It is not possible to combine more than one of the -fsanitize=address, -fsanitize=thread, and -fsanitize=memory checkers in the same program.
To make things worse, each of those three seems too useful to leave out. AddressSanitizer checks for memory errors, ThreadSanitizer checks for race conditions and MemorySanitizer checks for uninitialized reads. I'm worried about all of those things!
Obviously, if I have a hunch about where a bug lies, I can choose a sanitizer according to that. But what if I don't? Going further, what if I want to use the sanitizers as a preventative tool rather than a diagnostic one, to point out bugs that I didn't even know about?
In other words, given that I'm not looking for anything in particular, which sanitizer should I compile with by default? Am I just expected to compile and test the whole program three times, once for each sanitizer?

As you pointed out, sanitizers are typically mutually exclusive (you can combine only Asan+UBsan+Lsan, via -fsanitize=address,undefined,leak, maybe also add Isan via -fsanitize=...,integer if your program does not contain intentional unsigned overflows) so the only way to ensure complete coverage is to do separate QA runs with each of them (which implies rebuilding SW for every run). BTW doing yet another run with Valgrind is also recommended.
Using Asan in production has two aspects. On one hand common experience is that some bugs can only be detected in production so you do want to occasionally run sanitized builds there, to increase test coverage [*]. On the other hand Asan has been reported to increase attack surface in some cases (see e.g. this oss-security report) so using it as hardening solution (to prevent bugs rather than detect them) has been discouraged.
[*] As a side note, Asan developers also strongly suggest using fuzzing to increase coverage (see e.g. Cppcon15 and CppCon17 talks).
[**] See Asan FAQ for ways to make AddressSanitizer more strict (look for "aggressive diagnostics")

Is there a way to preprocess ruby code and find errors that would occur runtime?

We have huge code base and we are generating issues that would have been caught at compile time in type languages such as Java but we are not catching them until runtime in Ruby. This is bad since we generate bugs that most of the time are typos or refactoring that leaves some invalid code.
Example:
def mysuperfunc
# some code goes here
# this was a valid call but not anymore since enforcesecurity
# signature changed
#system.enforcesecurity
end
I mean, IDEs can do it but some guys use ATOM or sublime, so we need something to "compile" and report that kind of issues so they don't reach deployment. What have you been using?
This is generating a little percentage of our bug reports, but since we are forced to produce at a ridiculous pace we don't have 100% code coverage. If there is no tool to help, I'll just make sure everybody uses and IDE and run the reports with tools such as Rubymine.
Our stack includes, rspec, minitest, SimpleCov. We enforce code reviews, multistack deployments (dev, qa, pre-prod, sandbox, prod). And still some issues are reaching higher level and makes us programmers look bad. I'm not looking of magic, just a little automation that might help a bit.

Unfortunately, the Halting Problem, Rice's Theorem, and all the other Undecidability and Uncomputability Results tell us that it is simply impossible in the general case to statically determine any "interesting" property about the runtime behavior of a program. We cannot even statically determine something as simple as "will it halt", so how are we going to determine "is bug-free"?
There are certain things that can be statically determined, and there are certain restricted programs for which some interesting properties can be statically determined, but largely, this is not possible. And even to the small extent that it is possible, it generally requires the language to be specifically designed to be easy to statically analyze (which Ruby isn't).
That being said, there are certain tools that contain certain heuristics to point out code that may have problems. There are certain coding standards that may help avoid bugs, and there are tools to enforce those coding standards. Keywords to search for are "code quality tools", "linter", "static analyzer", etc. You have already been given examples in the other answers and comments, and given those examples and these keywords, you'll likely find more.
However, I also wanted to discuss something you wrote:
we are forced to produce at a ridiculous pace we don't have 100% code coverage
That's a problem, which has to be approached from two sides:
Practice, practice, practice. You need to practice testing and writing high-quality code until it is so naturally to you that not doing it actually ends up being harder and slower. It should become second nature to you, such that under pressure when your mind goes blank, the only thing you know is to write tests and write well-designed, well-factored, high-quality code. Note: I'm talking about deliberate practice, which means setting time aside to really practice … and practice is practice, it's not work, it's not fun, it's not hobby, if you don't delete the code you wrote immediately after you have written it, you are not practicing, you are working.
Sustainable Pace. You should never develop faster than the pace you could sustain indefinitely while still producing well-tested, well-designed, well-factored, high-quality code, having a fulfilling social life, no stress, plenty of free time, etc. This is something that has to be backed and supported and understood by management.

I'm unaware of anything exactly like you want. However, there are a few gems that will analyze code and warn you about some errors and/or bad practices. Try these:
https://github.com/bbatsov/rubocop
https://github.com/railsbp/rails_best_practices

FLAY
https://rubygems.org/gems/flay
Via the repo https://github.com/seattlerb/flay:
DESCRIPTION:
Flay analyzes code for structural similarities. Differences in literal
values, variable, class, method names, whitespace, programming style,
braces vs do/end, etc are all ignored. Making this totally rad.
[FEATURES:]
Reports differences at any level of code.
Adds a score multiplier to identical nodes.
Differences in literal values, variable, class, and method names are ignored.
Differences in whitespace, programming style, braces vs do/end, etc are ignored.
Works across files.
Add the flay-persistent plugin to work across large/many projects.
Run --diff to see an N-way diff of the code.
Provides conservative (default) and --liberal pruning options.
Provides --fuzzy duplication detection.
Language independent: Plugin system allows other languages to be flayed.
Ships with .rb and .erb.
javascript and others will be
available separately.
Includes FlayTask for Rakefiles.
Uses path_expander, so you can use:
dir_arg -- expand a directory automatically
#file_of_args -- persist arguments in a file
-path_to_subtract -- ignore intersecting subsets of
files/directories
Skips files matched via patterns in .flayignore (subset format of .gitignore).
Totally rad.
FLOG
https://rubygems.org/gems/flog
Via the repo https://github.com/seattlerb/flog:
DESCRIPTION:
Flog reports the most tortured code in an easy to read pain report.
The higher the score, the more pain the code is in.
[FEATURES:]
Easy to read reporting of complexity/pain.
Uses path_expander, so you can use:
dir_arg – expand a directory automatically
#file_of_args – persist arguments in a file
-path_to_subtract – ignore intersecting subsets of files/directories
SYNOPSIS:
% ./bin/flog -g lib
Total Flog = 1097.2 (17.4 flog / method)
323.8: Flog total
85.3: Flog#output_details
61.9: Flog#process_iter
53.7: Flog#parse_options
...

There is a ruby gem called guard that does automated testing. You can set your own custom rules.
For example, you can make it where anytime you modify certain files, the test framework will automatically run.
Here is the link for guard

How to use Agda's auto proof search effectively?

When writing proofs I noticed that Agda's auto proof search frequently wouldn't find solutions that seem obvious to me. Unfortunately coming up with a small example, that illustrates the problem seems to be hard, so I try to describe the most common patterns instead.
I forgot to add -m to the hole to make Agda look at the module scope. Can I make that flag the default? What downsides would that have?
Often the current hole can be filled by a parameter of the function I am about to implement. Even when adding -m, Agda will not consider function parameters or symbols introduced in let or where clauses though. Is there something wrong with simply trying all of them?
When viewing a goal, symbols introduced in let or where clauses are not even displayed. Why?
What other habits can make using auto more effective?

Agda's auto proof search is hardwired into the compiler. That makes it fast,
but limits the amount of customization you can do. One alternative approach
would be to implement a similar proof search procedure using Agda's
reflection mechanism. With the recent beefed up version of reflection using
the TC monad,
you no longer need to implement your own unification procedure.
Carlos
Tome's been working on reimplementing these ideas (check out his code
https://github.com/carlostome/AutoInAgda ). He's been working on several
versions that try to use information from the context, print debugging info,
etc. Hope this helps!

How do I know if a Delphi function is going to be inlined?

When you mark a function as inline, you hint the compiler that this function is a candidate for inlining. The compiler can still decide that it's not a good idea, and ignore it.
Is there a way to see if the function gets inlined or not, without using the disassembler?
Is there some compiler warning that I don't know about maybe?
What are the rules for inlining that the compiler uses? Are there constructs that cause a function to never get inlined for example?

The compiler emits a hint if it can't inline your function. The documentation explains the rules for what can and cannot be inlined.
As for the discretionary decisions that the compiler takes as to whether or not to inline (as opposed to whether or not inlining is possible), they are not documented and can be considered an implementation detail.
I recall that you recently commented on one of my answers to a different question that a particular function was 10 times faster once inlined. Clearly you are interested in inlining but in that particular case I cannot believe such an enormous gain for a function with so many floating point operations. I suspect that inlining is not actually giving you the performance improvements that you think it does.

You can look at the blue dots in the gutter after building the project. If there are blue dots next to a function it hasn't been inlined at least once.
I don't think you can rely on hints emitted by the compiler. It tells you when it's not inlined because the file the function lives in isn't in the interface uses clause. If it's because of other reasons it typically doesn't tell you.

How to hunt a Heisenbug

Recently, we received a bug report from one of our users: something on the screen was displayed incorrectly in our software. Somehow, we could not reproduce this in our development environment (Delphi 2007).
After some further study, it appears that this bug only manifests itself when "Code optimization" is turned on.
Are there any people here with experience in hunting down such a Heisenbug? Any specific constructs or coding bugs that commonly cause such an issue in Delphi software? Any places you would start looking?
I'll also just start debugging the whole thing in the usual way, but any tips specific to Optimization-related bugs (*) would be more than welcome!
(*) Note: I don't mean to say that the bug is caused by the optimizer; I think it's much more likely some wonky construct in the code is somehow pushed "over the edge" by the optimizer.
Update
It seems the bug boils down to a record being fully initialized with zeros when there's no code optimization, and the same record containing some random data when there is optimization. In this case, the random data seems to cause an enum type to contain invalid data (to my great surprise!).
Solution
The solution turned out to involve an unitialized local record variable somewhere deep in the code. Apparently, without optimization the record was reset (heap?), and with optimization turned on, the record was filled with the usual garbage. Thanks to you all for your contributions --- I learned a lot along the way!

Typically bugs of this form are caused by invalid memory access (reading uninitialised data, reading off the end of a buffer...) or thread race conditions.
The former will be affected by optimisations causing data layout to be rearranged in memory, and/or possibly by debug code that initialises newly allocated memory to some value; causing the incorrect code to "accidentally work".
The latter will be affected due to timings changing between optimisation levels. The former is generally much more likely.
If you have some automated way of making freshly allocated memory be filled with some constant value before it is passed to the program, and this makes the crash go away or become reproducible in the debug build, that'll provide a good point to start chasing things.

Could very well be a memory vs register issue: you programm running fine relying on memory persistence after a free.
I would recommend running your application with FastMM4 in full debug mode to be sure of your memory management.
Another (not free) tool which can be very useful in a case like this is Eurekalog.
Another thing that I've seen: a crash with the FPU registers being botched when calling some outside code (DLL, COM...) while with the debugger everything was OK.

A record that contains different data according to different compiler settings tells me one thing: That the record is not being explicitly initialised.
You may find that the setting of the compiler optimization flag is only one factor that might affect the content of that record - with any uninitialised data structures the one thing that you can rely on is that you can't rely on the initial content of the structure.
In simple terms:
class member data is initialised (to zero's) for new instances of the class
local variables (in functions and procedures) and unit variables are NOT initialised except in a few specific cases: interface references, dynamic arrays and strings and I think (but would need to check) records if they contain one or more fields of those types that would be initialised (strings, interface references etc).
The question as stated is now a little misleading because it seems you found your "Heisenberg" fairly easily enough. Now the issue is how to deal with it, and the answer is simply to explicitly initialise your record so that you aren't reliant on whatever behaviour or side-effect of the compiler is sometimes taking care of that for you and sometimes not.

Especially in purely native languages, like Delphi, you should be more than careful not to abuse the freedom to be able to cast anything to anything.
IOW: One thing, I have seen is that someone copies the definition of a class (e.g. from the implementation section in RTL or VCL) into his own code and then cast instances of the original class to his copy.
Now, after upgrading the library where the original class came from, you might experience all kinds of weird stuff. Like jumping into the wrong methods or bufferoverflows.
There's also the habit of using signed integer as pointers and vice-versa. (Instead of cardinal)
this works perfectly fine as long as your process has only 2GB of address space. But boot with the /3GB switch and you will see a lot of apps that start acting crazy. Those made the assumption of "pointer=signed integer" at least somewhere.
Your customer uses a 64Bit Windows? Chances are, he might have a larger address space for 32Bit apps. Pretty tough to debug w/o having such a test system available.
Then, there's race conditions.
Like having 2 threads, where one is very, very slow. So that you instinctively assume it will always be the last one and so there's no code that handles the scenario where "Captn slow" finishes first.
Changes in the underlying technologies can make these assumptions very wrong, very fast indeed.
Take a look at the upcoming breed of Flash-based super-mega-fast server storage.
Systems that can read and write Gigabytes per second. Applications that assume the IO stuff to be significantly slower than some calculations on in-memory values will easily fail on this kind of fast storage.
I could go on and on, but I gotta run right now...
Cheers

Code optimization does not mean necessarily that debug symbols have to be left out. Do a debug build with code optimization, then you can still debug the program and maybe the error occurs now.

One easy thing to do is Turn on compiler warning and hint, rebuild project and then fix all warnings/hints
Cheers

If it Delphi businesscode, with dataaware components etc, the follow might not apply.
I'm however writing machine vision code which is a bit computational. Most of the unittests are console based. I also am involved with FPC, and over the years have tested a lot with FPC. Partially out of hobby, partially in desperate situations where I wanted any hunch.
Some standard tricks that I tried (decreasing usefulness)
use -gv and valgrind the code (practically this means applications are required to run on Linux/FreeBSD. But for computational code and unittests that can be doable)
compile using fpc param -gt (=trash local vars, randomize local vars on procedure init)
modify heapmanager to randomize data of blocks it puts out (also applyable to Delphi code)
Try FPC's range/overflow checking and compiler hints.
run on a Mac Mini (powerpc) or win64. Due to totally different rules and memory layouts it can catch pretty funky things.
The 2 and 3 together nearly allow you to find most, if not all initialization problems.
Try to find any clues, and then go back to Delphi and search more focussed, debug etc.
I do realize this is not easy. I have a lot of FPC experience, and didn't have to find everything out from scratch for these cases. Still it might be worth a try, and might be a motivation to start setting up non-visual systems and unittests FPC compatible and platform independant. Most of this work will be needed anyway, seeing the Delphi roadmap.

In such problems i always advice to use logfiles.
Question: Can you somehow determine the incorrect display in the sourcecode?
If not, my answer wont help you.
If yes, check for the incorrectness, and as soon as you find it, dump the stack to a logfile. (see post mortem debugging for details about dumping and resymbolizing the stack).
If you see that some data has been corrupted, but you dont know how and then this happend, extract a function that does such a test for validity (with logging if failed), and call this function from more and more places over program execution (i.e. after each menu call). If you reiterate such a approach a few times you have good chances to find the problem.

Is this a local variable inside a procedure or function?
If so, then it lives on the stack, and will contain garbage. Depending on the execution path and compiler settings the garbage will change, potentially pushing your logic 'over the edge'.
--jeroen

Given your description of the problem I think you had uninitialized data that you got away with without the optimizer but which blew up with the optimization on.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart