alea.cuBase and CUBLAS - f#

I'm starting down the exciting road of GPU programming, and if I'm going to do some heavyweight number-crunching, I'd like to use the best libraries that are out there. I would especially like to use cuBLAS from an F# environment. CUDAfy offers the full set of drivers from their solution, and I have also been looking at Alea.cuBase, which has thrown up a few questions.
The Alea.cuSamples project on GitHub makes a cryptic reference to an Examples solution: "For more advanced test, please go to the MatrixMul projects in the Examples solution." However, I can't find any trace of these mysterious projects.
Does anyone know the location of the elusive "MatrixMul projects in the Examples solution"?
Given that cuSamples performs a straightfoward matrix multiplication, would the more advanced version, wherever it lives, use cuBLAS?
If not, is there a way to access cuBLAS from Alea.cuBase a la CUDAfy?

With Alea GPU V2, the new version we have now two options:
Alea Unbound library provides optimized matrix multiplication implementations http://quantalea.com/static/app/tutorial/examples/unbound/matrixmult.html
Alea GPU has cuBlas integrated, see tutorial http://quantalea.com/static/app/tutorial/examples/cublas/index.html

The matrixMulCUBLAS project is a C++ project that ships with the CUDA SDK, https://developer.nvidia.com/cuda-downloads. This uses cuBLAS to get astonishingly quick matrix multiplication (139 GFlops) on my home laptop.

Related

Drake C++ tutorials

can you recommend me C++ tutorials (or point me to the right direction how to learn Drake using C++), especially focused on robotic manipulators? Also what visualizer do you use within C++, since MeshCat seems to be used only for Python or Julia.
Thank you in advance
Even though Drake's tutorials are written in Python, all of the same ideas apply to Drake's C++ library. Drake's Python library is just a thin wrapper over the C++ API, so all of the concepts, class, and function names are all the same. Our hope is that the Python tutorials are a good starting point, even for users who plan to use Drake from C++.
To see various C++ sample code, you can also browse the examples:
https://github.com/RobotLocomotion/drake/tree/master/examples
The drake/examples/manipulation_station example involves an arm, gripper, and objects to manipulate.
Drake C++ does support MeshCat just fine.
Here is a third party Drake C++ tutorial https://drake.guzhaoyuan.com/
However, it is not maintained now and some of its contents may be out-of-date. In this tutorial, the author describes how to write Build.bazel and use the bazel build system, which I think is the major gap between the official python tutorial and using drake in C++. There seems to be no official helper document for how to use the bazel function like drake_cc_library or drake_cc_binary.
If you would like to use Drake C++ as an external dependency, here are some nice official examples https://github.com/RobotLocomotion/drake-external-examples

What makes libadalang special?

I have been reading about libadalang 1 2 and I am very impressed by it. However, I was wondering if this technique has already been used and another language supports a library for syntactically and semantically analyzing its code. Is this a unique approach?
C and C++: libclang "The C Interface to Clang provides a relatively small API that exposes facilities for parsing source code into an abstract syntax tree (AST), loading already-parsed ASTs, traversing the AST, associating physical source locations with elements within the AST, and other facilities that support Clang-based development tools." (See libtooling for a C++ API)
Python: See the ast module in the Python Language Services section of the Python Library manual. (The other modules can be useful, as well.)
Javascript: The ongoing ESTree effort is attempting to standardize parsing services over different Javascript engines.
C# and Visual Basic: See the .NET Compiler Platform ("Roslyn").
I'm sure there are lots more; those ones just came off the top of my head.
For a practical and theoretical grounding, you should definitely (re)visit the classical textbook Structure and Interpretation of Computer Programs by Abelson & Sussman (1st edition 1985, 2nd edition 1996), which helped popularise the idea of Metacircular Interpretation -- that is, interpreting a computer program as a formal datastructure which can be interpreted (or otherwise analysed) programmatically.
You can see "libadalang" as ASIS Mark II. AdaCore seems to be attempting to rethink ASIS in a way that will support both what ASIS already can do, and more lightweight operations, where you don't require the source to compile, to provide an analysis of it.
Hopefully the final API will be nicer than that of ASIS.
So no, it is not a unique approach. It has already been done for Ada. (But I'm not aware of similar libraries for other languages.)

Verilog or Vivado HLS or Vivado SDSoC

I want to convert my lane detection code written by C++ (OpenCV) to FPGA. Vivado HLS or Vivado SDSoC can help to embed the C ++ code into the FPGA. Or I can rewrite the lane detection code with verilog. The question is, what are the advantages and disadvantages of these three ways?
I want to use one of the cheap Zynq-7000 FPGAs.
Verilog is considered low-level these days. Compare it with assembly for software implementation. People use it only to get performance that they cannot attain with high-level languages such as C or Java in the software domain.
In the hardware domain, C (for Vivado HLS) or OpenCL are considered high-level languages. OpenCL was developed with portability to other architectures like GPUs and CPUs in mind. It has a lot more overhead in terms of communicating with the FPGA than Vivado HLS however.
Vivado HLS by itself produces just hardware modules in VHDL or Verilog, which you still have to connect to FPGA pins, ARM processors, etc. It does not take care of the communication to your module. You will still have to integrate your module in a Vivado block design or top-level VHDL or Verilog implementation yourself.
SDSoC, not "Vivado SDSoC" by the way, also lets you to write your entire implementation (hardware and software) in C. Under the hood, it will invoke Vivado HLS to implement the hardware module. Afterwards, the tool will take care of implementing an interface between your hardware and the on-board ARM processors that will run the software.
In summary, I recommend SDSoC unless you have a good reason not to use it. I do want to warn, however, that analyzing the synthesis results of Vivado HLS is a lot harder than analyzing Vivado output for Verilog or VHDL. Therefore, I always recommend to make sure that your code works as a software implementation first. With minimal effort, you should be able to compile any code in gcc or another compiler too. Don't use the synthesis results to debug your code, but just to analyze the performance.
SDSoc is better and easier, HLS like a blackbox, even UG902 have so many pages.
only my own opinion.
Take a look at Xilinx XAPP1167 and the Xilinx HLS Video Library Wiki.
That appnote is a few years old (older than the SDSoC tools) but has a reference design for accelerating OpenCV applications in a Zynq using HLS.
I can't speak to SDSoC, but I would highly recommend starting with HLS over a rewrite in Verilog. It sounds like you have exactly an intended use-case for HLS: to implement existing C++ applications in an FPGA. The downsides to it are (1) you'll likely need to modify your code a bit, since HLS doesn't support all C++ features, and (2) the performance may not be quite as good as a pure Verilog implementation.
Even if you have hardware design experience, manually translating C++ to Verilog will require some significant effort. I'd avoid that approach unless HLS or SDSoC doesn't give you the performance you need.
Start using OpenCL SDAccel or Intel SDK. OpenCL has verbose and well defined API - which is a good thing. It is very easy to learn and you can have parallel code execution similar to multi-module instances of Verilog/VHDL. OpenCl vs. HLS has benefits in not requiring to re-invent the whole system for managing data, I/O, pipes. etc. You get quite a bit of helper logic in OpenCL BSP (Intel) or shell (XILINX). Yeah, and start reading these long guides.
I would recommend SDAccel, as it is much more C++ "software" user friendly. At the same time, don't quote me on this, but I think they provide a OpenCV implementation out of the box, which means that probably you only need to massage you non-OpenCV code to achieve the performance you want.

What will be the alternate of win32api for Linux? [duplicate]

I'm moving from windows programming (By windows programming I mean using Windows API) to Linux Programming.
For programming Windows, the option we have is Win32API (MFC is just a C++ wrapper for the same).
I want to know if there is something like Linux API (equivalent to WINAPI) that is exposed directly to the programmer? Where can I find the reference?
With my little knowledge of POSIX library I see that it wraps around part of Linux API. But what about creating GUI applications? POSIX doesn't offer that. I know there are tons of 3rd party Widget toolkits like gtk, Qt etc. But I don't want to use the libraries that encapsulates Linux API. I want to learn using the "Core Linux API".
If there are somethings that I should know, please inform. Any programmer who is familiar with both Windows & Linux programming, please map the terminologies of Linux world so that I can quickly move on.
Any resources (books,tutorials,references) are highly appreciated.
I think you're looking for something that doesn't exactly exist. Unlike the Win32 API, there is no "Linux API" for doing GUI applications. The closest you can get is the X protocol itself, which is a pretty low level way of doing GUI (it's much more detailed and archaic than Win32 GDI, for example). This is why there exist wrappers such as GTK and Qt that hide the details of the X protocol.
The X protocol is available to C programs using XLib.
What you must understand is that Linux is very bare as to what is contained within it. The "Core" Linux API is POSIX and glibc. Linux is NOT graphical by default, so there is no core graphics library. Really, Windows could be stripped down to not have graphics also and thus not have parts of the win32 API like GDI. This you must understand. Linux is very lightweight compared to Windows.
For Linux there are two main graphical toolkits, GTK and Qt. I myself prefer GTK, but I'd research both. Also note that GTK and Qt exist for Windows to, because they are just wrappers. If you go take a look at the X protocol code for say xterm, you'll see why no one tries to actually creating graphical applications on top of it.
Oh, also SDL is pretty nice, it is pretty bare, but it is nice if your just needing a framebuffer for a window. It is portable between Linux and Windows and very easy to learn. But it will only stretch so far..
Linux and win aren't quite as different as it looks.
On both systems there exists a kernel that is not graphical.
It's just that Microsoft doesn't document this kernel and publishes an API that references various different components.
On Unix, it's more transparent. There really is a (non-GUI) kernel API and it is published. Then, there are services that run on top of this, optionally, and their interfaces are published without an attempt to merge them into an imaginary layer that doesn't really exist.
So, the lowest GUI level is a the X Window System and it has a lowest level library called Xlib. There are various libraries that run on top of this one, as you have noted.
I would highly recommended looking at the QT/C++ UI framework, it's arguably the most comprehensive UI toolkit for any platform.
We're using it at work developing cross platform apps that run on windows, osx and linux.
It also runs on Nokia's smart phone Operating System Maemo which has recently been merged with Intel's Moblin Linux OS, now called MeeGo.
This is going to sound insane since you're asking about "serious" stuff like C++ and C (and the "core linux API"), but you might want to consider building in something else. For instance:
Java Swing (many people love it! Others hate it and call it obsolete)
Mono GTK# (C# or VisualBasic or whatever you want, lots of people say it's pretty cool, but they're not not that many people)
Adobe AIR (ActionScript, you might hate it)
Titanium (totally new and unproven, but getting a lot of buzz in the iPhone world, at least)
And many other possibilities, some of which let you work on multiple platforms at once.
Sorry if this answer is not at all what you're looking for. The "real" answers on Linux are "pick a toolkit," which is also no answer at all :)
Have a look at Cairo. This something roughly similar to GDI+ and is under the hood of some of of the few usable GUI programs for Linux i.e. Firefox or Eclipse (SWT). It wraps most the natsy and ancient Linux stuff for you into a nice API that runs on most Linux installations without locking you into a entire subsystems like GTK or QT.
There is also the docs for the two different desktop platforms: Gnome and KDE that might help you down that road.

DirectCompute information

I've been trying to make use of the GPU as part of a project of mine. I've looked into both CUDA and OpenCL, but the lack of information showing you how to introduce these into a project is shocking. Even their dedicated forum groups are dead. So now, I'm looking into DirectCompute.
From what I can tell, it's simply a new type of shader file that makes use of HLSL. My question is this, does my program (aside from being DirectX 10 / 11 ) need its structure changed?
I mean, is it simply a case of creating the CS file, setting in the project like I would any other shader, and watch the magic happen?
Any information on this would be appreciated.
Yes CS fits into the usual DirectX programming structure. It works in a similar way to CUDA/OpenCL. Here is a good, simple example:
http://openvidia.sourceforge.net/index.php/DirectCompute
Personally I would suggest using CUDA/OpenCL rather than going the DirectCompute route if your project does not involve graphics. I think CUDA/OpenCL are better for general-purpose computing. It can be a little difficult to find documentation but these are the main aspects to GPU programming:
Setting up data on the CPU to pass to the GPU.
Understanding how many warps/threads need to be started on the GPU, how threads might need to communicate, etc.
Computing on the GPU, reading data back on the CPU
Another option is C++ AMP - please follow links from here for more info and feel free to post questions as you have them: http://blogs.msdn.com/b/nativeconcurrency/archive/2011/09/13/c-amp-in-a-nutshell.aspx
Easiest way - is to make project which uses CS with C# and SlimDX.
And here is good site with basics how to use CS from within C# code.
Later on you can move to full scale CS exploration with C++ and DirectX 11.

Resources