How to access vbe video memory? - memory

I am wondering how to create a pointer to the vbe video memory so I can display graphics from my 32-bit os kernel. Can anyone help me?

Download the VBE 3.0 standard (Wikipedia has all the links). Using it, write code to call function 2 to set a graphics mode, making sure you bitwise-OR the mode number with 4000h to tell the BIOS to use a flat linear frame buffer instead of windowed/segmented. Use function 1 to obtain the address of the video buffer (it should be in ModeInfoBlock.PhysBasePtr).
There should be plenty of reference code online for things like this.
I suggest to call VBE functions prior to going into protected mode as it will be harder to do afterwards.

Related

Should CUDA stream be waited to be complete even if the output data are to be sent to OpenGL instead of CPU?

This is a general question, and although I use OpenCV as a framework, the question is broader than OpenCV's realm.
I am developing an image processing tool that will effectively get image from a webcam (yielding a host-memory located cv::Mat), upload it to a GPU device memory in CUDA (i.e. cv::GpuMat), do some processing using CUDA and get a result finalCudaMat, and finally send the result to OpenGL (i.e. cv::ogl::Buffer::mapDevice + finalCudaMat.copyTo(mappedOglBuffer)). Everything works as intended.
Since the whole process involves multiple steps, I use a CUDA stream object (cv::cuda::Stream) to be able to make CUDA calls asynchronous and not wait on every single operation to be finished on CPU side. Now if someone instead is to eventually copy the result to a CPU matrix (i.e. finalCudaMat.download(finalCpuMat)), as in a customary situation, typically a wait on the stream is required (cudaStream.waitForCompletion()) to ensure the result is ready before using the CPU side matrix.
In my case, the the result never gets back to the CPU as it continues to be rendered on the screen (a bit of OpenGL operations and shaders are also involved).
One way, it might be appropriate to wait for CUDA work to finish before starting to copy the GpuMat to OpenGL Buffer. So if I add the wait on stream, everything is working fine and the CUDA operations take ~2.5ms.
Another way, it feels like I don't need to wait for completion of the stream (all the results are consumed by the GPU anyway -- CPU is never invovled again). Therefore I can remove the cudaStream.waitForCompletion() call before performing finalCudaMat.copyTo(mappedOglBuffer), and everything seems to be working fine. The whole CUDA processing operation (basically any GPU task minus OpenGL related) apparently takes ~1.8ms for me.
In the past I have had bad experience of not properly synchronization GPU work if two different APIs were involved (e.g. do something on Direct3D 9, do not wait for it to finish, and then copy the resulting texture to a Direct3D 10 texture, and clearly on some frames the image becomes empty or torn).
At this point, the difference is tiny and doesn't affect my 60 FPS throughput. But I wonder if I am technically doing a correct work by removing the wait-on-stream operation. Any thoughts on this? Or maybe a document regarding OpenGL/CUDA interop that could help me?
The rules are defined in this document: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#graphics-interoperability
In particular it says that
Accessing a resource through OpenGL, Direct3D, or another CUDA context while it is mapped produces undefined results.
That's a very strong hint that the needed synchronization is performed by cudaGraphicsUnmapResources, which is confirmed by its documentation:
This function provides the synchronization guarantee that any CUDA work issued in stream before cudaGraphicsUnmapResources() will complete before any subsequently issued graphics work begins.
So you won't need to make the CPU wait on CUDA completion, but you must call cudaGraphicsUnmapResources which will put the appropriate barrier in the asynchronous instruction stream. Note that unlike your CPU transfer code, this call goes after CUDA copies data into the OpenGL buffer.
As Ben Voigt already pointed out, CUDA requires explicit synchronization with OpenGL (or any other graphics API that interoperates with it). Now this used the be kind of a chore, where one had to submit callbacks to the compute stream and use them to manually work with e.g. OpenGL fences.
However due to the advent of Vulkan and with it the support for external resources (and OpenGL extensions for that) you can in fact synchonize between CUDA and OpenGL command streams, by having both sides import platform native semaphores (cudaImportExternalSemaphore, GL_EXT_semaphore) and use them for mutual synchronization. It usually still involves a whole round trip through the CPU side driver, but since that part has to manage the command streams anyway it's not really an issue of efficiency.

STM32F4 Flash Memory Write/Read Questions

I want to declare 4 large arrays of each can store 48000 float_32 constants.
I am using STM32F4, so i figured out ram isnt enough and i should use flash
instead. Upon my research, seems like everyone is talking about one should never use flash for write/read processes on runtime, go for eeprom instead. I am aware of the risks, but why do 1mb flash memory is in there if i should never use it? What is it used for?
On my real problem, i just want to be able to write and generate a sampling array and write it on flash for once the device is initialized, and never be touched again. It is safe this way, but I can't find any useful and easy to understand tutorials anywhere for writing on flash. Is it forbidden? Should we not use 1mb of huge space available on the chip? Please point me to a nice tutorial on writing/reading STM32F4 flash.

Access the whole video memory

I'm looking for a way to read the whole video memory that a video card outputs to a display. That includes also hardware accelerated output, video playback and output in fullscreen mode (that somehow I feel could be different from windowed mode).
In short: I want to be able to capture everything that is going to be represented on a display.
I suppose that IF that's possible it would be os-dependant. The targets I'm interested in are Windows OSX and Linux.
Do you have any hint?
For windows I guess you could take CamStudio, strip it down and use it to record the screen then do whatever you want with the output, other than that you could look into forensic kernel drivers for accessing RAM. It's not exactly as simple as a pointer pointing to the video memory anymore, haha.
Digital Rights Management, requested feature of Windows, attempts to block your access to blocks of graphics-card frame buffer memory. Using an open-source driver under Linux would seem to be the only way to access this memory, or as mentioned earlier, some 3rd party software that knows some back doors or hacks or ways to locate other program's frame buffer space.
Unless of course, you are trying to capture output from your own program (ie you are calling the video/graphics creation functions yourself), there are APIs to manipulate display frames in DirectX and OpenGL.
I think I found some resources that can help to capture the display memory in Windows
Fastest method of screen capturing
How to save backbuffer to file in DirectX 10?
http://betterlogic.com/roger/2010/07/fast-screen-capture/

What is the best size for a buffer in BlackBerry?

In my application I need to read data from an input stream. I have set the current buffer size for reading as 1024. But I have seen in some Android applications buffer size has been kept as 8192 (8 KB). Will there be any specific advantage if I increase the buffer size in my application to 8KB?
Any expert opinion will be much appreciated.
Edit: (I am using BB OS 6 and 7 and I am dealing with network inputstream.)
I can't say that I've found the universally best buffer size, but it seems to me that something in the range of 1KB to 8KB should be fine in most situations (for BlackBerry Java apps).
Keep in mind that if the amount of data is small (so you'd probably only need one or two buffers at 1KB-8KB), it's probably best just to use the IOUtilities method:
byte[] result = IOUtilities.streamToBytes(inputStream);
with which you don't need to actually pick a buffer size. But, if you know that result would be a large block of data, you're probably right in wanting to read one buffer at a time.
However, I would argue that the answer should almost always be obtained simply by building the app, and measuring performance with a few different values for byte buffer size. It's easy enough to change one constant, build, run and measure again, and then you're not guessing, or taking the advice of someone who doesn't know all the details of your app.
See here for information about BlackBerry Eclipse plugin memory analysis, and
here for BlackBerry Eclipse plugin profiling.
These tools are found in Eclipse by selecting the Window menu, then Show View -> Other... -> BlackBerry -> BlackBerry Memory Statistics View, or BlackBerry Profiler View, while debugging.
This way, you can see how much memory, or processor, the network code is using during the call to retrieve data and populate your buffer.
More
BlackBerry InputStream to String conversion
This question was also asked in the official BlackBerry forum here:
http://supportforums.blackberry.com/t5/Java-Development/What-is-the-best-size-for-a-buffer-in-BlackBerry/td-p/2559417
The OP gave this clarification:
"I am reading from network. Once I establish socket connection with the server, the server will send me notifications one after the other. I need to read the notifications/data from the inputstream available in the socket connection. For this I have a background thread which checks anything is available in the inputstream and if something is available, it will read with the help of a buffer and then passes the read data to a StringBuffer."
Given this information, I have a different take, in that I think the BlackBerry network handling abstracts the Java application from the network buffer processing to the extent that the application buffer size will have little if any impact on the performance.
But be aware, this is only my opinion.
My response on that thread was as follows:
First thing to note is that the method "isAvailable()", in my experience, does not work correctly on OS 5.0 and earlier. It is fixed in OS 6 (at least from my testing).
Because isAvailable() was broken, (and for other application reasons) what I have implemented for a socket connection is that each message is preceded by a length. So in the socket connection, I read the length of the next message, and then the actual data. This is done with no blocking - in other words I read the entire message, regardless of size. I recommend you do the same. The message must exist in full somewhere so it makes no difference if it is in some memory managed by the socket connection, or in some memory managed by you.
Note also, until OS 6.0, when you did the read you would get all the data to fill the buffer you had - in other words it waited till the buffer was full. In OS 6.0 and later, the read can complete without giving you a full buffer.
In your case, you might be working in a post OS 6.0 only, so you could use isAvailable() - create a buffer of that size, and read everything. I can't see that it makes any difference whether you have the bytes in memory managed by the socket, or memory managed by you.
But in fact, I would argue that the best approach is the one that makes your processing simplest. So for example, if you know that the next message is 200 bytes, then read 200 bytes, and then process that message. Then read the next message.
You could spend a lot of time attempting to manage the buffers to match the underlying socket buffers. I don't know exactly how the underlying BlackBerry socket processing code works, but it doesn't put data directly into your buffers. So let it manage its buffer size to optimize the network, you manage your buffer size to optimize your processing. That will work best for everyone.

Determining available video memory

When developing an OpenGL program, is there a way to poll from the system to find out just how many megabytes are available to store textures, etc?
Or is the standard approach these days just allocate memory and forget about everything?
Although the official stance remains "you don't need to know, you don't want to know, and it would not help you anyway", luckily at least two IHVs have shown a little more insight lately and offer extensions to query that information:
NVX_gpu_memory_info
ATI_meminfo
One nice thing about these extensions is that they have a least common denominator which is just what most people need, and you don't need to query extension support or do anything special, as they both work via glGetIntegerv.
In the easiest case, you can just initialize an array of 4 integers to zero (or some minimum default value that you'll assume in case the extensions don't work), then you call glGetIntegerv twice (with GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX and TEXTURE_FREE_MEMORY_ATI, respectively), and finally call glGetError to clear the error state. glGetIntegerv does not modify the pointed-to memory if it fails, nor does it crash or any other bad thing -- it merely sets the error state to GL_INVALID_ENUM.
Both extensions return a value in the first array position, the ATI one returns some values in the other 3 too.
n.b.: glAreTexturesResident has not been supported for almost a decade on mainstream hardware, in the same manner as texture priorities. The common mantra is that the driver writer knows much better than you anyway.
OpenGL doesn't give you this information. And frankly: There's only little benefit, simply because today we have multitasking operating systems. The OpenGL driver is responsible for swapping in texture data to/from system memory, if there's demand for it.
What OpenGL can do for you, is tell, if the textures you've uploaded are still resident in fast memory. The function is called "glAreTexturesResident". You can use this to gradually upload stuff to the GPU until you've filled up the GPU's memory. But keep in mind that you're not the only user of the GPU.

Resources