HLSL min16float fails to compile - directx

I have a shader:
float4 Test_PixelShader(float2 inTex:TEXCOORD,
uniform int mode ):COLOR
{
if(mode) // half
{
min16float2 t=(min16float2)inTex;
min16float2 r=1;
for(int i=0; i<256; i++)
{
r+=t*r;
}
return float4(r.x, r.y, mode, 1);
}else
{
float2 t=inTex;
float2 r=1;
for(int i=0; i<256; i++)
{
r+=t*r;
}
return float4(r.x, r.y, mode, 1);
}
}
#define TECHNIQUE5(name, vs, ps) technique11 name{pass p0{SetVertexShader(CompileShader(vs_5_0, vs)); SetPixelShader(CompileShader(ps_5_0, ps));}}
TECHNIQUE5(Test , Draw_VertexShader(), Test_PixelShader(0));
TECHNIQUE5(Test1, Draw_VertexShader(), Test_PixelShader(1));
Which I'm compiling using:
#define FLAGS_DX11 (D3DCOMPILE_ENABLE_BACKWARDS_COMPATIBILITY|D3DCOMPILE_OPTIMIZATION_LEVEL3|D3DCOMPILE_NO_PRESHADER)
D3DCompile(data.data(), data.elms(), null, d3d_macros.data(), &Include11(src), null, "fx_5_0", FLAGS_DX11, 0, &buffer, &error);
Compilation fails, and the only error message I get is:
warning X4717: Effects deprecated for D3DCompiler_47
No errors, just a warning, yet the shader blob is null.
However if I replace all min16float2 with float2, then compilation works OK.
How to get min16float working?
I've read the https://gpuopen.com/first-steps-implementing-fp16/ article, and it mentions it should work OK.
Do I need to use https://github.com/microsoft/DirectXShaderCompiler instead of D3DCompile from Win10SDK?
I have Windows 10, using latest Windows SDK

Looks like it fails because Microsoft abandoned "fx_*" targets, and have to use "vs/ps" targets instead.

Related

How to compile two versions of metal files

I want to support both 10.13 and 10.14 however I want to support fast math on 10.14. I am only able to compile project if I force #define __CIKERNEL_METAL_VERSION__ 200 but this means on 10.13 it will crash. How do I configure the project so it creates 2 metal libraries? So far the result file is default.metallib (compiling using Xcode)
BOOL supportsMetal;
#if TARGET_OS_IOS
supportsMetal = MTLCreateSystemDefaultDevice() != nil; //this forces GPU on macbook to switch immediatelly
#else
supportsMetal = [MTLCopyAllDevices() count] >= 1;
#endif
if (#available(macOS 10.13, *)) {
//only 10.14 fully supports metal with fast math, however there are hackintoshes etc...
if (supportsMetal) {
_kernel = [self metalKernel];
} else {
_kernel = [self GLSLKernel];
}
} else {
_kernel = [self GLSLKernel];
}
if (_kernel == nil) return nil;
METAL file
#include <metal_stdlib>
using namespace metal;
//https://en.wikipedia.org/wiki/List_of_monochrome_and_RGB_palettes
//https://en.wikipedia.org/wiki/Relative_luminance
//https://en.wikipedia.org/wiki/Grayscale
//<CoreImage/CIKernelMetalLib.h>
//only if you enable fast math (macOS10.14 or iOS12) otherwise fall back to float4 instead of half4
//forcing compilation for macOS 10.14+//iOS12+
#define __CIKERNEL_METAL_VERSION__ 200
constant half3 kRec709Luma = half3(0.2126, 0.7152, 0.0722);
constant half3 kRec601Luma = half3(0.299 , 0.587 , 0.114);
//constant float3 kRec2100Luma = float3(0.2627, 0.6780, 0.0593);
#include <CoreImage/CoreImage.h>
extern "C" { namespace coreimage {
float lumin601(half3 p)
{
return dot(p.rgb, kRec601Luma);
}
float lumin709(half3 p)
{
return dot(p.rgb, kRec709Luma);
}
half4 thresholdFilter(sample_h image, float threshold)
{
half4 pix = unpremultiply(image);
float luma = lumin601(pix.rgb);
pix.rgb = half3(step(threshold, luma));
return premultiply(pix);
}
}}
XCode 11 supports Metal libraries.
Add a new build target to your project.
Add metal files in compile sources
If you use Core Image add these linker flags. Change deployment targets (ios12+ ) and check for fast math.
To your original project target add new dependencies and copy script
cp "${BUILT_PRODUCTS_DIR}"/*.metallib "${METAL_LIBRARY_OUTPUT_DIR}"
Optional:
Avoiding hard coded strings everywhere in project. Add xconfig file to project
MY_METAL_LIBRARY_NAME_10_13 = Metal_10_13_aaa
MY_METAL_LIBRARY_NAME_10_14 = Metal_10_14_bbb
GCC_PREPROCESSOR_DEFINITIONS = $(inherited) MY_METAL_LIBRARY_NAME_10_13='#"$(MY_METAL_LIBRARY_NAME_10_13)"' MY_METAL_LIBRARY_NAME_10_14='#"$(MY_METAL_LIBRARY_NAME_10_14)"'
Add xconfig as configuration (don't set it for project cause you will end up with double import)
Change PRODUCT_NAME variable of each metal library to a variable
Use preprocesor variables in code
static NSString *const kMetallibExtension = #"metallib";
NSString *const kMetalLibraryOldTarget = MY_METAL_LIBRARY_NAME_10_13; //#"Metal_10_13";
NSString *const kMetalLibraryFastMathTarget = MY_METAL_LIBRARY_NAME_10_14; //#"Metal_10_14";
+ (NSString *)metalLibraryName
{
if (#available(macOS 10.14, *)) {
return kMetalLibraryFastMathTarget;
} else {
return kMetalLibraryOldTarget;
}
//use default
//return #"default";
}

OpenCV VideoWriter won't write anything, although cvWriteToAVI does

I'm been trying to capture video from a cam and write it into an AVI file. I'm using Qt 4.8.2 with MSVC 2010 (x86) on Windows 7. I have 2 versions of the code: one using cv::Mat and the other using IplImage*. However, only the IplImage* version is working. Here's my code using cv::Mat:
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
using namespace cv;
int main() {
VideoCapture* capture2 = new VideoCapture( CV_CAP_DSHOW );
Size size2 = Size(640,480);
int codec = CV_FOURCC('M', 'J', 'P', 'G');
VideoWriter* writer2 = new VideoWriter("video.avi",codec,15,size2);
int a = 100;
Mat frame2;
while ( a > 0 ) {
capture2->read(frame2);
writer2->write(frame2);
a--;
}
writer2->release();
capture2->release();
return 0;
}
And here's the code using IplImage*:
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
int main() {
CvCapture* capture = cvCaptureFromCAM( CV_CAP_DSHOW );
CvSize size = cvSize(640,480);
int codec = CV_FOURCC('M', 'J', 'P', 'G');
CvVideoWriter* writer = cvCreateVideoWriter("video.avi",codec,15,size);
int a = 100;
while ( a > 0 ) {
IplImage* frame = cvQueryFrame( capture );
cvWriteToAVI(writer,frame);
a--;
}
cvReleaseVideoWriter(&writer);
cvReleaseCapture( &capture );
return 0;
}
It's basically the same, or at least it looks like the same thing to me. It reads 100 frames and should write them into "video.avi". It compiles and runs without errors, but the cv::Mat version doesn't write anything, and the IplImage* version works perfectly.
Does someone have any idea on what's going on?
The syntax in Opencv C++ reference is bit different, and here is a working code in C++.
I Just added imshow and waitkey, for checking you can remove them if you want.
int main()
{
VideoCapture* capture2 = new VideoCapture(CV_CAP_DSHOW);
Size size2 = Size(640, 480);
int codec = CV_FOURCC('M', 'J', 'P', 'G');
// Unlike in C, here we use an object of the class VideoWriter//
VideoWriter writer2("video_.avi", codec, 15.0, size2, true);
writer2.open("video_.avi", codec, 15.0, size2, true);
if (writer2.isOpened())
{
int a = 100;
Mat frame2;
while (a > 0)
{
capture2->read(frame2);
imshow("live", frame2);
waitKey(100);
writer2.write(frame2);
a--;
}
}
else
{
cout << "ERROR while opening" << endl;
}
// No Need to release the Writer as the distructor will called automatically
capture2->release();
return 0;
}
I had the same problem over and over again, and none of the solutions I found online helped.
Strange enough, the problem (identified purely with a trial and error method) was with the write permission. Everything worked after I sudo chmod u+rwx the python script.
I have the same problem and after a few time i realize that the input video isn't the same size with the output. Resize the input video may help u.
capture2->read(frame2);
cv::resize(frame2,frame2,cv::Size(640,480);
writer2->write(frame2);

Loading .fbx models into directX 10

I'm trying to load in meshes into DirectX 10. I've created a bunch of classes that handle it and allow me to call in a mesh with only a single line of code in my main game class.
How ever, when I run the program this is what renders:
In the debug output window the following errors keep appearing:
D3D10: ERROR: ID3D10Device::DrawIndexed: Input Assembler - Vertex Shader linkage error: Signatures between stages are incompatible. The reason is that Semantic 'TEXCOORD' is defined for mismatched hardware registers between the output stage and input stage. [ EXECUTION ERROR #343: DEVICE_SHADER_LINKAGE_REGISTERINDEX ]
D3D10: ERROR: ID3D10Device::DrawIndexed: Input Assembler - Vertex Shader linkage error: Signatures between stages are incompatible. The reason is that the input stage requires Semantic/Index (POSITION,0) as input, but it is not provided by the output stage. [ EXECUTION ERROR #342: DEVICE_SHADER_LINKAGE_SEMANTICNAME_NOT_FOUND ]
The thing is, I've no idea how to fix this. The code I'm using does work and I've simply brought all of that code into a new project of mine. There are no build errors and this only appears when the game is running
The .fx file is as follows:
float4x4 matWorld;
float4x4 matView;
float4x4 matProjection;
struct VS_INPUT
{
float4 Pos:POSITION;
float2 TexCoord:TEXCOORD;
};
struct PS_INPUT
{
float4 Pos:SV_POSITION;
float2 TexCoord:TEXCOORD;
};
Texture2D diffuseTexture;
SamplerState diffuseSampler
{
Filter = MIN_MAG_MIP_POINT;
AddressU = WRAP;
AddressV = WRAP;
};
//
// Vertex Shader
//
PS_INPUT VS( VS_INPUT input )
{
PS_INPUT output=(PS_INPUT)0;
float4x4 viewProjection=mul(matView,matProjection);
float4x4 worldViewProjection=mul(matWorld,viewProjection);
output.Pos=mul(input.Pos,worldViewProjection);
output.TexCoord=input.TexCoord;
return output;
}
//
// Pixel Shader
//
float4 PS(PS_INPUT input ) : SV_Target
{
return diffuseTexture.Sample(diffuseSampler,input.TexCoord);
//return float4(1.0f,1.0f,1.0f,1.0f);
}
RasterizerState NoCulling
{
FILLMODE=SOLID;
CULLMODE=NONE;
};
technique10 Render
{
pass P0
{
SetVertexShader( CompileShader( vs_4_0, VS() ) );
SetGeometryShader( NULL );
SetPixelShader( CompileShader( ps_4_0, PS() ) );
SetRasterizerState(NoCulling);
}
}
In my game, the .fx file and model are called and set as follows:
Loading in shader file
//Set the shader flags - BMD
DWORD dwShaderFlags = D3D10_SHADER_ENABLE_STRICTNESS;
#if defined( DEBUG ) || defined( _DEBUG )
dwShaderFlags |= D3D10_SHADER_DEBUG;
#endif
ID3D10Blob * pErrorBuffer=NULL;
if( FAILED( D3DX10CreateEffectFromFile( TEXT("TransformedTexture.fx" ), NULL, NULL, "fx_4_0", dwShaderFlags, 0, md3dDevice, NULL, NULL, &m_pEffect, &pErrorBuffer, NULL ) ) )
{
char * pErrorStr = ( char* )pErrorBuffer->GetBufferPointer();
//If the creation of the Effect fails then a message box will be shown
MessageBoxA( NULL, pErrorStr, "Error", MB_OK );
return false;
}
//Get the technique called Render from the effect, we need this for rendering later on
m_pTechnique=m_pEffect->GetTechniqueByName("Render");
//Number of elements in the layout
UINT numElements = TexturedLitVertex::layoutSize;
//Get the Pass description, we need this to bind the vertex to the pipeline
D3D10_PASS_DESC PassDesc;
m_pTechnique->GetPassByIndex( 0 )->GetDesc( &PassDesc );
//Create Input layout to describe the incoming buffer to the input assembler
if (FAILED(md3dDevice->CreateInputLayout( TexturedLitVertex::layout, numElements,PassDesc.pIAInputSignature, PassDesc.IAInputSignatureSize, &m_pVertexLayout ) ) )
{
return false;
}
model loading:
m_pTestRenderable=new CRenderable();
//m_pTestRenderable->create<TexturedVertex>(md3dDevice,8,6,vertices,indices);
m_pModelLoader = new CModelLoader();
m_pTestRenderable = m_pModelLoader->loadModelFromFile( md3dDevice,"armoredrecon.fbx" );
m_pGameObjectTest = new CGameObject();
m_pGameObjectTest->setRenderable( m_pTestRenderable );
// Set primitive topology, how are we going to interpet the vertices in the vertex buffer
md3dDevice->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST );
if ( FAILED( D3DX10CreateShaderResourceViewFromFile( md3dDevice, TEXT( "armoredrecon_diff.png" ), NULL, NULL, &m_pTextureShaderResource, NULL ) ) )
{
MessageBox( NULL, TEXT( "Can't load Texture" ), TEXT( "Error" ), MB_OK );
return false;
}
m_pDiffuseTextureVariable = m_pEffect->GetVariableByName( "diffuseTexture" )->AsShaderResource();
m_pDiffuseTextureVariable->SetResource( m_pTextureShaderResource );
Finally, the draw function code:
//All drawing will occur between the clear and present
m_pViewMatrixVariable->SetMatrix( ( float* )m_matView );
m_pWorldMatrixVariable->SetMatrix( ( float* )m_pGameObjectTest->getWorld() );
//Get the stride(size) of the a vertex, we need this to tell the pipeline the size of one vertex
UINT stride = m_pTestRenderable->getStride();
//The offset from start of the buffer to where our vertices are located
UINT offset = m_pTestRenderable->getOffset();
ID3D10Buffer * pVB=m_pTestRenderable->getVB();
//Bind the vertex buffer to input assembler stage -
md3dDevice->IASetVertexBuffers( 0, 1, &pVB, &stride, &offset );
md3dDevice->IASetIndexBuffer( m_pTestRenderable->getIB(), DXGI_FORMAT_R32_UINT, 0 );
//Get the Description of the technique, we need this in order to loop through each pass in the technique
D3D10_TECHNIQUE_DESC techDesc;
m_pTechnique->GetDesc( &techDesc );
//Loop through the passes in the technique
for( UINT p = 0; p < techDesc.Passes; ++p )
{
//Get a pass at current index and apply it
m_pTechnique->GetPassByIndex( p )->Apply( 0 );
//Draw call
md3dDevice->DrawIndexed(m_pTestRenderable->getNumOfIndices(),0,0);
//m_pD3D10Device->Draw(m_pTestRenderable->getNumOfVerts(),0);
}
Is there anything I've clearly done wrong or are missing? Spent 2 weeks trying to workout what on earth I've done wrong to no avail.
Any insight a fresh pair eyes could give on this would be great.

Memory Issues with cvShowImage and Kinect SDK: Skeletal Viewer

I'm using cvSetData to get the rgb frame into one I can use for openCV.
I modified the SkeletalViewer slightly to produce the rgb stream.
void CSkeletalViewerApp::Nui_GotVideoAlert( )
{
const NUI_IMAGE_FRAME * pImageFrame = NULL;
IplImage* kinectColorImage = cvCreateImage(cvSize(640,480),IPL_DEPTH_8U, 4);
HRESULT hr = NuiImageStreamGetNextFrame(
m_pVideoStreamHandle,
0,
&pImageFrame );
if( FAILED( hr ) )
{
return;
}
NuiImageBuffer * pTexture = pImageFrame->pFrameTexture;
KINECT_LOCKED_RECT LockedRect;
pTexture->LockRect( 0, &LockedRect, NULL, 0 );
if( LockedRect.Pitch != 0 )
{
BYTE * pBuffer = (BYTE*) LockedRect.pBits;
m_DrawVideo.DrawFrame( (BYTE*) pBuffer );
cvSetData(kinectColorImage, (BYTE*) pBuffer,kinectColorImage->widthStep);
cvShowImage("Color Image", kinectColorImage);
//cvReleaseImage( &kinectColorImage );
cvWaitKey(10);
}
else
{
OutputDebugString( L"Buffer length of received texture is bogus\r\n" );
}
cvReleaseImage(&kinectColorImage);
NuiImageStreamReleaseFrame( m_pVideoStreamHandle, pImageFrame );
}
With the cvReleaseImage, I would get a cvException error. Not exactly sure which one as it didn't specify. Without cvReleaseImage, I would get the rgb video running in an openCV window but would eventually crash because it ran out of memory.
How should I release the image properly?
Just solved this problem.
After a bunch of sleuthing using breakpoints and debugging, it appears as though the problem has to do with the pointers used in cvSetData. My best guess is that Nui_GotVideoAlert() updates the address pointed to by pBuffer before cvReleaseImage is called. In addition, cvSetData never appears to copy the bytes from this address.
What happens then is that cvReleaseImage is called on an address that no longer exists.
I fixed this by declaring kinectColorImage at the top of NuiImpl.cpp, calling cvSetData in ::Nui_GotVideoAlert(), and only calling cvReleaseImage in the Nui_Uninit() method. This way, kinectColorImage will just update instead of creating a new IplImage in each call of Nui_GotVideoAlert().
That's strange. As far as I know, cvReleaseImage released both the image header and the image data. I did the piece of code below and in this certain example, cvReleaseImage does not free the buffer that contains the data. There I didn't use cvSetData but I just updated the pointer to the image data. If you uncomment the commented lines and comment the ones just below each one, program still runs but you'll get some memory leaks. I used OpenCV 2.2 (this is the legacy interface).
#include <opencv/cv.h>
#include <stdlib.h>
#define NLOOPS 1000
int main(void){
int i,j
char *buff = (char *) malloc( sizeof(char) * 3 * 640 * 480 );
for( i = 0; i < 640 * 480 * 3; i++ ) buff[i] = 128;
j = 0;
while( j++< NLOOPS ){
IplImage *im = cvCreateImage(cvSize(640,480),IPL_DEPTH_8U, 3);
//cvSetData(im, buff, im->widthStep); ---> If you use that version you'll get memory leaks. Comment line below.
im->imageData = buff;
cvWaitKey(4);
cvShowImage("kk", im);
//cvReleaseImageHeader(&im); ---> If you use that version you'll get memory leaks. Comment line below.
cvReleaseImage(&im);
free(im);
}
free(buff);
return 0;
}

HLSL error X3086: DX9-style 'compile' syntax is deprecated in strict mode

Hey,
I get this error:
error X3086: DX9-style 'compile' syntax is deprecated in strict mode
When compiling a directx effect with this code:
hr=D3DX11CompileFromFile( TEXT("shaders\\basic.fx"), NULL, NULL, NULL,"fx_5_0", D3DCOMPILE_ENABLE_STRICTNESS, 0, NULL, &pBlob, &pErrorBlob, NULL );
I'm pretty sure it's complaining about this:
technique11 basic
{
pass p0
{
VertexShader = compile vs_5_0 vsMain();
PixelShader = compile ps_5_0 psMain();
}
}
So what am I supposed to use instead of compile?
Try:
technique11 basic
{
pass p0
{
SetVertexShader( CompileShader( vs_5_0, vsMain() ) );
SetPixelShader( CompileShader( ps_5_0, psMain() ) );
}
}

Resources