Custom HLSL shader making weird patterns across icosphere - xna

really hoping that someone can help me here - I rarely can't resolve bugs in C# since I have a fair amount of experience in it but I don't have a lot to go on with HLSL.
The picture linked to below is of the same model (programmatically generated on run) twice, the first (white) time using BasicEffect and the second time using my custom shader, listed below. The fact that it works with BasicEffect makes me think that it's not an issue with generating the normals for the model or anything like that.
I've included different levels of subdividing to better illustrate the issue. It's worth mentioning that both effects are using the same lighting direction.
https://imagizer.imageshack.us/v2/801x721q90/673/qvXyBk.png
Here's my shader code (feel free to pick it apart, any tips are very welcome):
float4x4 WorldViewProj;
float4x4 NormalRotation = float4x4(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1);
float4 ModelColor = float4(1, 1, 1, 1);
bool TextureEnabled = false;
Texture ModelTexture;
sampler ColoredTextureSampler = sampler_state
{
texture = <ModelTexture>;
magfilter = LINEAR; minfilter = LINEAR; mipfilter = LINEAR;
AddressU = mirror; AddressV = mirror;
};
float4 AmbientColor = float4(1, 1, 1, 1);
float AmbientIntensity = 0.1;
float3 DiffuseLightDirection = float3(1, 0, 0);
float4 DiffuseColor = float4(1, 1, 1, 1);
float DiffuseIntensity = 1.0;
struct VertexShaderInput
{
float4 Position : POSITION0;
float4 Normal : NORMAL0;
float2 TextureCoordinates : TEXCOORD0;
};
struct VertexShaderOutput
{
float4 Position : POSITION0;
float4 Color : COLOR0;
float2 TextureCoordinates : TEXCOORD0;
};
VertexShaderOutput VertexShaderFunction(VertexShaderInput input)
{
VertexShaderOutput output = (VertexShaderOutput)0;
output.Position = mul(input.Position, WorldViewProj);
float4 normal = mul(input.Normal, NormalRotation);
float lightIntensity = dot(normal, DiffuseLightDirection);
output.Color = saturate(DiffuseColor * DiffuseIntensity * lightIntensity);
output.TextureCoordinates = input.TextureCoordinates;
return output;
}
float4 PixelShaderFunction(VertexShaderOutput input) : COLOR0
{
float4 pixBaseColor = ModelColor;
if (TextureEnabled == true)
{
pixBaseColor = tex2D(ColoredTextureSampler, input.TextureCoordinates);
}
float4 lighting = saturate((input.Color + AmbientColor * AmbientIntensity) * pixBaseColor);
return lighting;
}
technique BestCurrent
{
pass Pass1
{
VertexShader = compile vs_2_0 VertexShaderFunction();
PixelShader = compile ps_2_0 PixelShaderFunction();
}
}

In general, when implementing a lighting equation, there are a few things to ensure:
Normals, light directions, and other directional vectors should be normalized before using them in a dot product. In your case you could add something like:
normal = normalize(normal);
The same should be done for DiffuseLightDirection if it is already not normalized. It already is with your default value, but if your app changes it, it might not be normalized anymore. For that, it would be better to normalize in the application code since it only needs to be done once when it changes, and not per vertex.
Also remember that if you are multiplying the vector by a matrix that contains a scale, the vector will no longer be normalized, so it will need to be re-normalized.
The light direction and the normal must point in the same direction which is out from the surface. Your default light direction is (1,0,0). If you want light to point in the +x direction, then you must actually negate the vector before performing the dot product with the normal so that it is pointing out from the surface just like the normal. If you already take this into account, then it's not a problem.
Vectors can't be translated since they are just a direction not a position. So it is important to ensure when you transform them with a matrix that either the fourth component (w) of the vector is 0 or the matrix you are transforming it with has no translation. Setting w to 0 will zero out any translation from the matrix during the multiply. Since your matrix is called NormalRotation, I'm assuming it only contains a rotation, so this probably isn't an issue.

Related

DirectX + GLM Depth Reconstruction issues

I'm trying to port my engine to DirectX and I'm currently having issues with depth reconstruction. It works perfectly in OpenGL (even though I use a bit of an expensive method). Every part besides the depth reconstruction works so far. I use GLM because it's a good math library that has no need to install any dependencies or anything for the user.
So basically I get my GLM matrices:
struct DefferedUBO {
glm::mat4 view;
glm::mat4 invProj;
glm::vec4 eyePos;
glm::vec4 resolution;
};
DefferedUBO deffUBOBuffer;
// ...
glm::mat4 projection = glm::perspective(engine.settings.fov, aspectRatio, 0.1f, 100.0f);
// Get My Camera
CTransform *transform = &engine.transformSystem.components[engine.entities[entityID].components[COMPONENT_TRANSFORM]];
// Get the View Matrix
glm::mat4 view = glm::lookAt(
transform->GetPosition(),
transform->GetPosition() + transform->GetForward(),
transform->GetUp()
);
deffUBOBuffer.invProj = glm::inverse(projection);
deffUBOBuffer.view = glm::inverse(view);
if (engine.settings.graphicsLanguage == GRAPHICS_DIRECTX) {
deffUBOBuffer.invProj = glm::transpose(deffUBOBuffer.invProj);
deffUBOBuffer.view = glm::transpose(deffUBOBuffer.view);
}
// Abstracted so I can use OGL, DX, VK, or even Metal when I get around to it.
deffUBO->UpdateUniformBuffer(&deffUBOBuffer);
deffUBO->Bind());
Then in HLSL, I simply use the following:
cbuffer MatrixInfoType {
matrix invView;
matrix invProj;
float4 eyePos;
float4 resolution;
};
float4 ViewPosFromDepth(float depth, float2 TexCoord) {
float z = depth; // * 2.0 - 1.0;
float4 clipSpacePosition = float4(TexCoord * 2.0 - 1.0, z, 1.0);
float4 viewSpacePosition = mul(invProj, clipSpacePosition);
viewSpacePosition /= viewSpacePosition.w;
return viewSpacePosition;
}
float3 WorldPosFromViewPos(float4 view) {
float4 worldSpacePosition = mul(invView, view);
return worldSpacePosition.xyz;
}
float3 WorldPosFromDepth(float depth, float2 TexCoord) {
return WorldPosFromViewPos(ViewPosFromDepth(depth, TexCoord));
}
// ...
// Sample the hardware depth buffer.
float depth = shaderTexture[3].Sample(SampleType[0], input.texCoord).r;
float3 position = WorldPosFromDepth(depth, input.texCoord).rgb;
Here's the result:
This just looks like random colors multiplied with the depth.
Ironically when I remove transposing, I get something closer to the truth, but not quite:
You're looking at Crytek Sponza. As you can see, the green area moves and rotates with the bottom of the camera. I have no idea at all why.
The correct version, along with Albedo, Specular, and Normals.
I fixed my problem at gamedev.net. There was a matrix majorness issue as well as a depth handling issue.
https://www.gamedev.net/forums/topic/692095-d3d-glm-depth-reconstruction-issues

Texture sampler in HLSL does not interpolate

I am currently working on a multi-textured terrain and I have problems with the Sample function of Texture2DArray.
In my example, I use a Texture2DArray to store a set of different terrain texture, e.g. grass, sand, asphalt, etc. Each of my vertices stores a texture coordinate (UV coordinate) and an index of the texture I want to use. So, if my index is 0, I use the first texture. If the index is 1, I use the second texture, and so on. This works fine, as long as my index is a natural number (0, 1, ..). However, it fails, if the index is a real number (like 1.5f).
In order to look for the problem, I reduced my entire pixel shader to this:
Texture2DArray DiffuseTextures : register(t0);
Texture2DArray NormalTextures : register(t1);
Texture2DArray EmissiveTextures : register(t2);
Texture2DArray SpecularTextures : register(t3);
SamplerState Sampler : register(s0);
struct PS_IN
{
float4 pos : SV_POSITION;
float3 nor : NORMAL;
float3 tan : TANGENT;
float3 bin : BINORMAL;
float4 col : COLOR;
float4 TextureIndices : COLOR1;
float4 tra : COLOR2;
float2 TextureUV : TEXCOORD0;
};
float4 PS(PS_IN input) : SV_Target
{
float4 texCol = DiffuseTextures.Sample(Sampler, float3(input.TextureUV, input.TextureIndices.r));
return texCol;
}
The following image shows the result of a sample scene on the left side. As you can see, there is a hard border between the used textures. There is no form of interpolation.
In order to check my texture indices, I changed my pixel shader from above by returning the texture indices as a color:
return float4(input.TextureIndices.r, input.TextureIndices.r, input.TextureIndices.r, 1.0f);
The result can be seen on the right side of the image. The texture indices are correct, since they range in the interval [0, 1] and you can clearly see the interpolation at the border of the area. However, my sampled texture does not show any form of interpolation.
Since my pixel shader is pretty simple, I wonder what causes this behaviour? Is there any setting in DirextX responsible for this?
I use DirectX 11, pixel shader ps_5_0 (I also tested with ps_4_0) and I use DDS textures (BC3 compression).
Edit
This is the sampler I am using:
SharpDX.Direct3D11.SamplerStateDescription samplerStateDescription = new SharpDX.Direct3D11.SamplerStateDescription()
{
AddressU = SharpDX.Direct3D11.TextureAddressMode.Wrap,
AddressV = SharpDX.Direct3D11.TextureAddressMode.Wrap,
AddressW = SharpDX.Direct3D11.TextureAddressMode.Wrap,
Filter = SharpDX.Direct3D11.Filter.MinMagMipLinear
};
SharpDX.Direct3D11.SamplerState samplerState = new SharpDX.Direct3D11.SamplerState(_device, samplerStateDescription);
_deviceContext.PixelShader.SetSampler(0, samplerState);
Solution
I made a function using the code presented by catflier for getting a texture color:
float4 GetTextureColor(Texture2DArray textureArray, float2 textureUV, float textureIndex)
{
float tid = textureIndex;
int id = (int)tid;
float l = frac(tid);
float4 texCol1 = textureArray.Sample(Sampler, float3(textureUV, id));
float4 texCol2 = textureArray.Sample(Sampler, float3(textureUV, id + 1));
return lerp(texCol1, texCol2, l);
}
This way, I can get the desired texture color for all texture types (diffuse, specular, emissive, ...) with a simple function call:
float4 texCol = GetTextureColor(DiffuseTextures, input.TextureUV, input.TextureIndices.r);
float4 bumpMap = GetTextureColor(NormalTextures, input.TextureUV, input.TextureIndices.g);
float4 emiCol = GetTextureColor(EmissiveTextures, input.TextureUV, input.TextureIndices.b);
float4 speCol = GetTextureColor(SpecularTextures, input.TextureUV, input.TextureIndices.a);
The result is as smooth as I wanted it to be: :-)
Texture arrays do not sample across slices, so technically, this is expected result.
If you want to interpolate between slices (eg: 1.5f gives you "half" of second texture and "half" of third texture), you can use a Texture3d instead, which allows this (but will cost some more as it will perform trilinear filtering)
Otherwise, you can perform your sampling that way :
float4 PS(PS_IN input) : SV_Target
{
float tid = input.TextureIndices.r;
int id = (int)tid;
float l = frac(tid); //lerp amount
float4 texCol1 = DiffuseTextures.Sample(Sampler, float3(input.TextureUV,id));
float4 texCol2 = DiffuseTextures.Sample(Sampler, float3(input.TextureUV,id+1));
return lerp(texCol1,texCol2, l);
}
Please note that this technique is quite more flexible, since you can also provide non adjacent slices as input (so you can lerp between slice 2 and 23 for example), and eventually use a different blend mode by changing lerp by some other function.

Shadow Mapping Artifacts

I just started messing around with shadow mapping. I understand the algorithm used. The thing is I cannot for the life of me figure out where I am messing up in the HLSL code. Here it is:
//These change
float4x4 worldViewProj;
float4x4 world;
texture tex;
//These remain constant
float4x4 lightSpace;
float4x4 lightViewProj;
float4x4 textureBias;
texture shadowMap;
sampler TexS = sampler_state
{
Texture = <tex>;
MinFilter = LINEAR;
MagFilter = LINEAR;
MipFilter = LINEAR;
AddressU = WRAP;
AddressV = WRAP;
};
sampler TexShadow = sampler_state
{
Texture = <shadowMap>;
MinFilter = LINEAR;
MagFilter = LINEAR;
MipFilter = LINEAR;
};
struct A2V
{
float3 posL : POSITION0;
float2 texCo : TEXCOORD0;
};
struct OutputVS
{
float4 posH : POSITION0;
float2 texCo : TEXCOORD0;
float4 posW : TEXCOORD2;
};
//Vertex Shader Depth Pass
OutputVS DepthVS(A2V IN)
{
OutputVS OUT = (OutputVS)0;
//Get screen coordinates in light space for texture map
OUT.posH = mul(float4(IN.posL, 1.f), lightViewProj);
//Get the depth by performing a perspective divide on the projected coordinates
OUT.posW.x = OUT.posH.z/OUT.posH.w;
return OUT;
}
//Pixel shader depth Pass
float4 DepthPS(OutputVS IN) : COLOR
{
//Texture only uses red channel, just store it there
return float4(IN.posW.x, 0, 0, 1);
}
//VertexShader Draw Pass
OutputVS DrawVS(A2V IN)
{
OutputVS OUT = (OutputVS)0;
//Get the screen coordinates for this pixel
OUT.posH = mul(float4(IN.posL, 1.f), worldViewProj);
//Send texture coordinates through
OUT.texCo = IN.texCo;
//Pass its world coordinates through
OUT.posW = mul(float4(IN.posL, 1.f), world);
return OUT;
}
//PixelShader Draw Pass
float4 DrawPS(OutputVS IN) : COLOR
{
//Get the pixels screen position in light space
float4 texCoord = mul(IN.posW, lightViewProj);
//Perform perspective divide to normalize coordinates [-1,1]
//texCoord.x = texCoord.x/texCoord.w;
//texCoord.y = texCoord.y/texCoord.w;
//Multiply by texture bias to bring in range 0-1
texCoord = mul(texCoord, textureBias);
//Get corresponding depth value
float prevDepth = tex2D(TexShadow, texCoord.xy);
//Check if it is in shadow
float4 posLight = mul(IN.posW, lightViewProj);
float currDepth = posLight.z/posLight.w;
if (currDepth >= prevDepth)
return float4(0.f, 0.f, 0.f, 1.f);
else
return tex2D(TexS, IN.texCo);
}
//Effect info
technique ShadowMap
{
pass p0
{
vertexShader = compile vs_2_0 DepthVS();
pixelShader = compile ps_2_0 DepthPS();
}
pass p1
{
vertexShader = compile vs_2_0 DrawVS();
pixelShader = compile ps_2_0 DrawPS();
}
}
I have verified that all my matrices are correct and the depth map is being drawn correctly. Rewrote all of the C++ that handles this code and made it neater and I am still getting the same problem. I am not currently blending the shadows, just drawing them flat black until I can get them to draw correctly. The light uses an orthogonal projection because it is a directional light. I dont have enough reputation points to embed images but here are the URLs: Depth Map - http://i.imgur.com/T2nITid.png
Program output - http://i.imgur.com/ae3U3N0.png
Any help or insight would be greatly appreciated as its for a school project. Thanks
The value you get from the depth buffer is float value that is from 0 to 1. As you probably already know, floating points are not accurate and the more decimal places you request the less accurate it is and this is where you end up with artifacts.
There are some things you can do. The easiest way is to make the value of the far and near Z in the projection matrix closer to each other so that the depth buffer will not use so many decimal places to represent how far away the object is. I usually find that having a value of 1-200 gives me a fairly good accurate result.
Another easy thing you can do is increase the size of the texture you are drawing on as that will give you more pixels and therefore it will represent the scene more accurately.
There are also a lot of complex things that games engines can do to improve on shadow mapping artifacts but you can write a book about that and if you really do want to get into it than I would recommended you start with the blog.

Sampling a texture within vertex shader?

I'm using DirectX 11 targeting Shader Model 5 (well actually using SharpDX for directx 11.2 build) and i'm at loss with what is wrong with this simple shader i'm writing.
The pixel shader is applied on a flat high poly plane (so there are plenty of vertices to displace), the texture is sampled without issues (the pixel shader displays it fine), however the displacement in the vertex shader just doesn't work.
It's not an issue with the displacement itsels, replacing the += height.SampleLevel with += 0.5 shows all vertices displaced, it's not an issue with the sampling either since the very same code works in the pixel shader. And as far as i understand it's not an API usage issue since i hear SampleLevel (unlike sample) is perfectly usable in the VS (since you provide the LOD level).
Using a noise function to displace instead of a texture also works just fine, which leads me to think the issue is with the texture sampling and only Inside the VS, which is odd since i'm using a function that is supposedly VS compatible?
I've been trying random things for the past hour and i'm really clueless as to where to look, i also find nearly no info about displacement mapping in hlsl and even less so in DX11 + to use as reference.
struct VS_IN
{
float4 pos : POSITION;
float2 tex : TEXCOORD;
};
struct PS_IN
{
float4 pos : SV_POSITION;
float2 tex : TEXCOORD;
};
float4x4 worldViewProj;
Texture2D<float4> diffuse: register(t0);
Texture2D<float4> height: register(t1);
Texture2D<float4> lightmap: register(t2);
SamplerState pictureSampler;
PS_IN VS( VS_IN input )
{
PS_IN output = (PS_IN) 0;
input.pos.z += height.SampleLevel(pictureSampler, input.tex, 0).r;
output.pos = mul(input.pos, worldViewProj);
output.tex = input.tex;
return output;
}
float4 PS( PS_IN input ) : SV_Target
{
return height.SampleLevel(pictureSampler, input.tex, 0).rrrr;
}
For reference in case it matters:
Texture description :
new SharpDX.Direct3D11.Texture2DDescription()
{
Width = bitmapSource.Size.Width,
Height = bitmapSource.Size.Height,
ArraySize = 1,
BindFlags = SharpDX.Direct3D11.BindFlags.ShaderResource ,
Usage = SharpDX.Direct3D11.ResourceUsage.Immutable,
CpuAccessFlags = SharpDX.Direct3D11.CpuAccessFlags.None,
Format = SharpDX.DXGI.Format.R8G8B8A8_UNorm,
MipLevels = 1,
OptionFlags = SharpDX.Direct3D11.ResourceOptionFlags.None,
SampleDescription = new SharpDX.DXGI.SampleDescription(1, 0),
};
Sampler description :
var sampler = new SamplerState(device, new SamplerStateDescription()
{
Filter = Filter.MinMagMipLinear,
AddressU = TextureAddressMode.Clamp,
AddressV = TextureAddressMode.Clamp,
AddressW = TextureAddressMode.Clamp,
BorderColor = Color.Pink,
ComparisonFunction = Comparison.Never,
MaximumAnisotropy = 16,
MipLodBias = 0,
MinimumLod = 0,
MaximumLod = 16
});
So the answer was that Ronan wasn't setting the shader resource view to the vertex shader so it can access the textures and set the calculated values.

HLSL Pixel shader lighting performance (XNA)

I have a simple enough shader that supports multiple point lights.
Lights are stored as an array of Light structs (up to a max size) and I pass in the number of active lights when it changes.
The problem is in the PixelShader function:
It's basic stuff, get the base color from the texture, loop through the lights array for 0 to numActiveLights and add the effect, and it works fine, but performance is terrible!
BUT if I replace the reference to the global var numActiveLights with a constant of the same value performance is fine.
I just can't fathom why referencing the variable makes a 30+ fps difference.
Can anyone please explain?
Full Shader code:
#define MAX_POINT_LIGHTS 16
struct PointLight
{
float3 Position;
float4 Color;
float Radius;
};
float4x4 World;
float4x4 View;
float4x4 Projection;
float3 CameraPosition;
float4 SpecularColor;
float SpecularPower;
float SpecularIntensity;
float4 AmbientColor;
float AmbientIntensity;
float DiffuseIntensity;
int activeLights;
PointLight lights[MAX_POINT_LIGHTS];
bool IsLightingEnabled;
bool IsAmbientLightingEnabled;
bool IsDiffuseLightingEnabled;
bool IsSpecularLightingEnabled;
Texture Texture;
sampler TextureSampler = sampler_state
{
Texture = <Texture>;
Magfilter = POINT;
Minfilter = POINT;
Mipfilter = POINT;
AddressU = WRAP;
AddressV = WRAP;
};
struct VS_INPUT
{
float4 Position : POSITION0;
float2 TexCoord : TEXCOORD0;
float3 Normal : NORMAL0;
};
struct VS_OUTPUT
{
float3 WorldPosition : TEXCOORD0;
float4 Position : POSITION0;
float3 Normal : TEXCOORD1;
float2 TexCoord : TEXCOORD2;
float3 ViewDir : TEXCOORD3;
};
VS_OUTPUT VS_PointLighting(VS_INPUT input)
{
VS_OUTPUT output;
float4 worldPosition = mul(input.Position, World);
output.WorldPosition = worldPosition;
float4 viewPosition = mul(worldPosition, View);
output.Position = mul(viewPosition, Projection);
output.Normal = normalize(mul(input.Normal, World));
output.TexCoord = input.TexCoord;
output.ViewDir = normalize(CameraPosition - worldPosition);
return output;
}
float4 PS_PointLighting(VS_OUTPUT IN) : COLOR
{
if(!IsLightingEnabled) return tex2D(TextureSampler,IN.TexCoord);
float4 color = float4(0.0f, 0.0f, 0.0f, 0.0f);
float3 n = normalize(IN.Normal);
float3 v = normalize(IN.ViewDir);
float3 l = float3(0.0f, 0.0f, 0.0f);
float3 h = float3(0.0f, 0.0f, 0.0f);
float atten = 0.0f;
float nDotL = 0.0f;
float power = 0.0f;
if(IsAmbientLightingEnabled) color += (AmbientColor*AmbientIntensity);
if(IsDiffuseLightingEnabled || IsSpecularLightingEnabled)
{
//for (int i = 0; i < activeLights; ++i)//works but perfoemnce is terrible
for (int i = 0; i < 7; ++i)//performance is fine but obviously isn't dynamic
{
l = (lights[i].Position - IN.WorldPosition) / lights[i].Radius;
atten = saturate(1.0f - dot(l, l));
l = normalize(l);
nDotL = saturate(dot(n, l));
if(IsDiffuseLightingEnabled) color += (lights[i].Color * nDotL * atten);
if(IsSpecularLightingEnabled) color += (SpecularColor * SpecularPower * atten);
}
}
return color * tex2D(TextureSampler, IN.TexCoord);
}
technique PerPixelPointLighting
{
pass
{
VertexShader = compile vs_3_0 VS_PointLighting();
PixelShader = compile ps_3_0 PS_PointLighting();
}
}
My guess is that changing the loop constraint to be a compile-time constant is allowing the HLSL compiler to unroll the loop. That is, instead of this:
for (int i = 0; i < 7; i++)
doLoopyStuff();
It's getting turned into this:
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
Loops and conditional branches can be a significant performance hit inside of shader code, and should be avoided wherever possible.
EDIT
This is just off the top of my head, but maybe you could try something like this?
for (int i = 0; i < MAX_LIGHTS; i++)
{
color += step(i, activeLights) * lightingFunction();
}
This way you calculate all possible lights, but always get a value of 0 for inactive lights. The benefit would depend on the complexity of the lighting function, of course; you would need to do more profiling.
Try using PIX to profile it. http://wtomandev.blogspot.com/2010/05/debugging-hlsl-shaders.html
Alternatively, read this rambling speculation:
Maybe because with a constant, the compiler can unravel and collapse your loop's instructions. When you replace it with a variable, the compiler becomes unable to make the same assumptions.
Though, somewhat unrelated to your actual question, I would push a lot of those conditions /calculations to the software level.
if(IsDiffuseLightingEnabled || IsSpecularLightingEnabled)
^- Like that.
Also, I think you could precompute a few thing before you call the shader program as well. Like l = (lights[i].Position - IN.WorldPosition) / lights[i].Radius; Pass a precomputed array of those rather than calculating each time over every pixel.
I might be misinformed of the optimizations that the HLSL compiler does, but I think each calculation you do like that on the pixel shader gets executed screen w*h times (though this is done insanely parallel), and I vaguely remember there being some limits to the number of instructions you could have in a shader (like 72?). (though I think that restriction was liberalized a lot in higher versions of HLSL). Maybe the fact that your shader generates so many instructions -- maybe it breaks your program up and turns it into a multi-pass pixel shader on compilation. If that's the case, that probably adds significant overhead.
Actually, here's another idea that might be stupid: Passing a variable to a shader has it transmit the data to the GPU. That transmission happens with limited bandwidth. Perhaps the compiler is smart enough such that when you're only staticly indexing the first 7 elements in an array, only transfer 7 elements. When the compiler doesn't make that optimization (because you aren't iterating with constants), it pushes the WHOLE array every frame, and you're flooding the bus. If that's the case, then my earlier suggestion of pushing calculations out, and passing more results in, would only make the problem worse, heh.

Resources