099 compute shaders with images only?

Hi,

Having my very first look at compute shaders in 099 (and compute shaders in general), I’m guessing they’re only meant to be used with imageLoad/Store and deal with image variables, and not shader buffer objects?

Guessing for shader buffer objects to make sense a mean to access them for rendering geometry would be needed though I’m not sure how it would work currently.

I was looking at this nvidia sample docs.nvidia.com/gameworks/index … s%7C_____6

Thanks a lot!
Best,
Vincent

Currently we don’t have support for shader storage buffers. I think compute shaders could read/write to those though also if we added support.

+1 for shader storage buffers, they appear to be very useful (maybe essential?) for compute shader use.

Is it possible for a compute shader to output to a 2D texture array? I’m a bit stuck because you can’t output to arbitrary layers (aka layout(location=0) out vec4 buffer0 will fail to compile), shader storage buffers aren’t implemented yet, and then when I switch the GLSLTop to output a 2d texture array or 3d texture, the imageStore function fails. The message produced is unable to find compatible overloaded function "imageStore(struct unimage3D4x8_bindless, ivec2, vec4)" . Does this mean the implementation for imageStore accessing 2d texture arrays or 3d textures hasn’t been implemented yet?

Thanks!

Compute shaders have no notion of a direct output, so location semantics don’t work. The only output is done via imageStore(). To write to a 3D texture you need 3 coordinates, not 2. The error I see there says you are giving it an ivec2, which means you aren’t saying which slice in the 3D texture you want to write to.

I don’t think they are essential. You can do lots of things with compute shaders and images without shader storage buffers. Shader storage buffers are mainly for ease of use, but functionally you can do the same thing with textures and a little more code.

That’s not to say we won’t add them, but we just haven’t yet.

Thanks for the clarification, Malcolm! I was hoping I wasn’t missing something simple like that. :unamused: Storage buffers aren’t really needed then, just a matter of convenience, like you said.

Actually Malcolm, after working on this some more, I realize there is a reason to use shader storage buffers - Non normalized data. If the only way to get data out of a compute shader is an output color buffer, all data has to be normalized to [0-1.0], correct? Where in a traditional vertex / geo / pixel shader you’d pass arbitrary information down the pipeline within a struct.

I’m new to compute shaders, but already having data coming out of one. Super cool tools to play with! Thanks for adding them in.

Jumping back in, it seems it’s the same as a regular fragment shader, make sure the format is 16 float or 32bits floats, and then you can write arbitrary values.
Modifying the default example from vec4(1.0) to vec4(10.0) and monitoring the value with TopTo it seems fine.

Malcolm, since we still have to pack things in vec4 and multiple color buffers :wink: quick question for you :
How do I output to all the color buffers in the compute shader?

I tried

imageStore(sTDComputeOutputs[0], ivec2(gl_GlobalInvocationID.xy), vec4(0.1,0.0,0,1));
imageStore(sTDComputeOutputs[1], ivec2(gl_GlobalInvocationID.xy), vec4(0.0,1.0,0,1));

but it fills the buffer 0 with green and nothing in buffer1.
what is sTDComputeOutputs? A bit more doc, at least on TD specifics, would be great!

Thanks a lot!

Vincent, you’re totally correct! I just assumed anything over 1.0 was clamped, since I hadn’t switched the top type to 32bit. :unamused:

Malcolm, I take back my last post about the normalized data. :blush: Moving forward with texture buffers!

Vincent, as a stopgap for your color buffer question, what I’m doing is using a 2d texture array. Make sure that’s enabled in the glslTop with an appropriate number of layers, then try this:

imageStore(sTDComputeOutputs[0], ivec3(gl_GlobalInvocationID.xy,0), pos);
imageStore(sTDComputeOutputs[0], ivec3(gl_GlobalInvocationID.xy,1), vel);

Thanks Jonathan!

Probably I won’t go super deep into compute shaders right now though, I think I was interested in streamlining the gpu sim → lookup in the vertex shader workflow, but it seems it would be the same with compute shaders as is.

Still seem they have some nice perks like atomic counters and shared memory, so will definitely look more into them.

Cheers
Vincent

There was a bug with multiple buffer output for compute shaders. This will be fixed in build 2017.2580 or later. I’ll also add some more documentation right now.

Great, thanks Malcolm!

Not sure it’s part of the not-yet implemented SSBOs, but am I correct the atomic counter isn’t available yet as well?

Trying to do something like the following gives me always a 0 counter:

layout(binding=0, offset=0) uniform atomic_uint ac;
layout (local_size_x = 16, local_size_y = 16) in;
void main()
{
	uint counter = atomicCounterIncrement(ac);
	imageStore(sTDComputeOutputs[0],ivec2(gl_GlobalInvocationID.xy), vec4(counter));
}

I’m quite new to compute shaders, so perhaps I’m missing something here?

Cheers,
Tim

Haven’t tried myself yet and also new to compute shaders, but curious about the answer as well/a working example, since atomic counters seem a great feature of compute shaders.

Vincent

They are currently not supported. I’ll look into adding them though. I’m pretty swamped right now finishing off some other features.

thanks Malcolm! Seems this page was added pretty recently which makes me hopeful :slight_smile:
derivative.ca/wiki099/index. … ute_Shader

Will definitely start looking into them more seriously now that 099 is officially out and bother you with more questions!

Yes! Please update us when Atomic counters or SSBO’s are on the way.
:mrgreen: :mrgreen: :mrgreen:

Hi Malcolm,

One thing on the wiki (derivative.ca/wiki099/index. … _Shaders_2) caught my eye :

Compute Shaders
When creating a 3D Texture or a 2D Texture Array with a compute shader, the shader is still only ran once. The entire output texture is available to be written to using imageStore, and should be filled as desired, possibly with a Z dispatch size equal to the depth of the texture.

Still have to get more familiar with GPU profiling, but would it mean using a compute shader to write to a 3d texture is more efficient?

Thanks!

Another question/bug report, going through
khronos.org/opengl/wiki/Compute_Shader

doing color.xyz = gl_WorkGroupSize;
causes a fatal error :

(74) : fatal error C9999: *** exception during compilation ***

I guess I don’t really need gl_WorkGroupSize since it’s defined with the local_size_x qualifier above and gl_GlobalInvocationID and gl_LocalInvocationID are provided, unlike with CUDA, but seems odd.

Another note, for half a second the default dispatch size of 64x64x1 felt confusing, since the default texture size is 256x256 and the default code is
layout (local_size_x = 8, local_size_y = 8) in;
32x32x1 would make more sense.

Cheers
Vincent

1 Like