AVX2/AVX512 for modulo operations? - avx

I'm experimenting AVX instructions for ECDSA. I'm wondering if AVX2/AVX512 can be leveraged to perform modulo operations? If so, how?
Thanks

Related

How to evaluate the performance of facial recognition library?

I am tasked with evaluating the performance of two separate SDK solutions.
I would like to ask what's the suggested way to evaluate which one is better in terms of precision?
thanks

Lua library for data analysis (data frames)

Is there any Lua implementation of data frames - structures for data analysis which? Something like Python pandas. I want to do some statistical operations using LuaJIT.
Yes, there is now. Check out torch-dataframe that I and Alex are developing. Our main priority is reliability that we check with a rich test suite. Performance comes second although we try to avoid performance hogs within the limits of Lua.
The package is currently far from the pandas sophistication, but feel free to contribute with any methods you feel lacking.
You may want to look at Torch7 that provides N-dimensional arrays with support for various statistical and mathematical operations and is based on LuaJIT.

How possible vector operations on a matrix that does not fit memory

How is it possible to make calculations on a matrix with size 6GB and RAM is 4GB? What techniques are used in this case? Is there any open source solution or tool using files during vector operations?
Yes, the famous Hadoop is an open source computing platform, which can be used for operations on pretty big matrices (and not only for that).
For examples, please read this page.

SIMD math libraries for SSE and AVX

I am looking for SIMD math libraries (preferably open source) for SSE and AVX. I mean for example if I have a AVX register v with 8 float values I want sin(v) to return the sin of all eight values at once.
AMD has a propreitery library, LibM http://developer.amd.com/tools/cpu-development/libm/ which has some SIMD math functions but LibM only uses AVX if it detects FMA4 which Intel CPUs don't have. Also I'm not sure it fully uses AVX as all the function names end in s4 (d2) and not s8 (d4). It give better performance than the standard math libraries on Intel CPUs but it's not much better.
Intel has the SVML as part of it's C++ compiler but the compiler suite is very expensive on Windows. Additionally, Intel cripples the library on non-Intel CPUs.
I found the following AVX library, http://software-lisc.fbk.eu/avx_mathfun/, which supports a few math functions (exp, log, sin, cos, and sincos). It gives very fast results for me, faster than SVML, but I have not checked the accuracy. It only works on single floating point and does not work in Visual Studio (though that would be easy to fix). It's based on another SSE library.
Does anyone have any other suggestions?
Edit: I found a SO thread that has many answers on this subject
Vectorized Trig functions in C?
I have implemented Vecmathlib https://bitbucket.org/eschnett/vecmathlib/ as a generic libraries for two other projects (The Einstein Toolkit, and pocl http://pocl.sourceforge.net/). Vecmathlib is open source, and is written in C++.
Gromacs is a highly optimized molecular dynamics software package written in C++ that makes use of SIMD. As far as I know the mathematics SIMD functionality has not yet been split out into a separate library but I guess the implementation might be useful for others nonetheless.
https://github.com/gromacs/gromacs/blob/master/src/gromacs/simd/simd_math.h
http://manual.gromacs.org/documentation/2016.4/doxygen/html-lib/simd__math_8h.xhtml

How do I Perform Integer SIMD operations on the iPad A4 Processor?

I feel the need for speed. Double for loops are killing my iPad apps performance. I need SIMD. How do I perform integer SIMD operations on the iPad A4 processor?
Thanks,
Doug
The instruction set is NEON, intrinsics reference
I've never been able to find good documentation on what they all actually are. But you pick it up pretty quickly if you've had any exposure to SSE
To get the fastest speed, you will have to write ARM Assembly language code that uses NEON SIMD operations, because the C compilers generally don't make very good SIMD code, so hand-written Assembly will make a big difference. I have a brief intro here: http://www.shervinemami.co.cc/iphoneAssembly.html
Note that the iPad A4 uses the ARMv7-A CPU, so the reference manual for the NEON SIMD instructions is at: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b/index.html
(but its 2000 pages long and requires the understanding of Assembly code and perhaps SIMD in general!).

Resources