Loading pendrive sectors - bios

How can we load sectors of the pendrive using bios interrupts??
Having low level disk access is needed for booting from the pendirve....
I got to hear that, we can use int 13h to load sectors...but how to use the same interrupts to load sectors of a pendrive?? what are the parameters required to load the sectors......???
How can we load a sector, say 2560 of a pendrive...
How to calculate the disk parameters to be used for the int 13h ??
How to get the cylinder , track numbers for a pendrive?????

If the BIOS can boot a pen drive then the BIOS is emulating a hard drive or floppy drive for the pen drive. int 13h with AH=48h or AH=08h asks the BIOS for the geometry of the emulated hard drive or floppy drive.

Related

Counting number of allocations into the Write Pending Queue - unexpected low result on NV memory

I am trying to use some of the uncore hardware counters, such as: skx_unc_imc0-5::UNC_M_WPQ_INSERTS. It's supposed to count the number of allocations into the Write Pending Queue. The machine has 2 Intel Xeon Gold 5218 CPUs with cascade lake architecture, with 2 memory controllers per CPU. linux version is 5.4.0-3-amd64. I have the following simple loop and I am reading this counter for it. Array elements are 64 byte in size, equal to cache line.
for(int i=0; i < 1000000; i++){
array[i].value=2;
}
For this loop, when I map memory to DRAM NUMA node, the counter gives around 150,000 as a result, which maybe makes sense: There are 6 channels in total for 2 memory controllers in front of this NUMA node, which use DRAM DIMMs in interleaving mode. Then for each channel there is one separate WPQ I believe, so skx_unc_imc0 gets 1/6 from the entire stores. There are skx_unc_imc0-5 counters that I got with papi_native_avail, supposedly each for different channels.
The unexpected result is when instead of mapping to DRAM NUMA node, I map the program to Non-Volatile Memory, which is presented as a separate NUMA node to the same socket. There are 6 NVM DIMMs per-socket, that create one Interleaved Region. So when writing to NVM, there should be similarly 6 different channels used and in front of each, there is same one WPQ, that should get again 1/6 write inserts.
But UNC_M_WPQ_INSERTS returns only around up 1000 as a result on NV memory. I don't understand why; I expected it to give similarly around 150,000 writes in WPQ.
Am I interpreting/understanding something wrong? Or is there two different WPQs per channel depending wether write goes to DRAM or NVM? Or what else could be the explanation?
It turns out that UNC_M_WPQ_INSERTS counts the number of allocations into the Write Pending Queue, only for writes to DRAM.
Intel has added corresponding hardware counter for Persistent Memory: UNC_M_PMM_WPQ_INSERTS which counts write requests allocated in the PMM Write Pending Queue for IntelĀ® Optaneā„¢ DC persistent memory.
However there is no such native event showing up in papi_native_avail which means it can't be monitored with PAPI yet. In linux version 5.4, some of the PMM counters can be directly found in perf list uncore such as unc_m_pmm_bandwidth.write - Intel Optane DC persistent memory bandwidth write (MB/sec), derived from unc_m_pmm_wpq_inserts, unit: uncore_imc. This implies that even though UNC_M_PMM_WPQ_INSERTS is not directly listed in perf list as an event, it should exist on the machine.
As described here the EventCode for this counter is: 0xE7, therefore it can be used with perf as a raw hardware event descriptor as following: perf stat -e uncore_imc/event=0xe7/. However, it seems that it does not support event modifiers to specify user-space counting with perf. Then after pinning the thread in the same socket as the NVM NUMA node, for the program that basically only does the loop described in the question, the result of perf kind of makes sense:
Performance counter stats for 'system wide': 1,035,380 uncore_imc/event=0xe7/
So far this seems to be the the best guess.

What is the size of memory in this diagram?

I want to ask some questions about this diagram that showing the main memory with OS and different processes : how can I compute the size of main memory in Kbytes ? and What will happen if Process B generates a logical address of 200? Will the CPU return a
physical address or error?
I'd assume the unlabeled numbers on the left are addresses in bytes; which would imply there's 2048 bytes (or 2 KiB) of something (virtual space, or physical space, or maybe even RAM if there's no devices mapped into the physical space). Of course it could just as easily be 2048 bits, or 2048 (36-bit) words, or..
If Process B tries to access logical address of 200; it might work (no security), or it might cause some kind of trap/exception because the process doesn't have permission to access the operating system's area; or it could be impossible for the process to do that (e.g. maybe the design of the CPU restricts the process to unsigned offsets from a base address of 1203).

How to detect or probe or scan usable/accessible physical memory?

How to exhaustively detect or probe or scan usable/accessible physical memory?
I'm currently making a custom bootloader in NASM for a x86_64 custom operating system.
In order to assign which physical address to contain which data, I want to make sure that the memory is guaranteed free for use. I've already tried BIOS interrupt int 0x15 eax 0xE820 and checked out device manager memory resources.
The problem is that none of them covers fully.
For example,
it says 0x0000000000000000 ~ 0x000000000009FC00 is usable.
But strictly speaking, 0x0000000000000000 ~ 0x0000000000000500 is not usable because it stores IVT and BDA.
Also, there are PCI holes here and there.
My objective here is detecting or probing or scanning entire memory available in my hardware and make a memory map so that I can distinguish which address is which. (Example map below)
0x0000000000000000 ~ 0x00000000000003FF : Real Mode IVT
0x0000000000000400 ~ 0x00000000000004FF : BDA
...
0x0000000000007C00 ~ 0x0000000000007DFF : MBR Load Address
...
0x00000000000B8000 ~ 0x00000000000B8FA0 : VGA Color Text Video Memory
...
0x00000000C0000000 ~ 0x00000000FFFFFFFF : PCI Space
My processor is intel i7-8700K 8th gen.
How much information do you want?
If you only want to know which areas are usable RAM; then "int 0x15, eax=0xE820" (with the restriction that the BDA will be considered usable), or UEFI's "get memory map" function, are all you need. Note that for both cases one/some areas may be reported as "ACPI reclaimable", which means that after you've finished parsing ACPI tables (or if you don't care about ACPT tables) the RAM will become usable.
If you want more information you need to do more work, because the information is scattered everywhere. Specifically:
ACPI's SRAT table describes which things (e.g. which areas of memory) are in which NUMA domain; and ACPI's SLIT table describes the performance implications of that.
ACPI's SRAT table also describes which things (e.g. which areas of memory) are "hot plug removable" and reserved for "hot insert".
the CPU's CPUID instruction will tell you "physical address size in bits". This is useful to know if/when you're trying to find a suitable area of the physical address space to use for a memory mapped PCI device's BARs, because the memory maps you get are too silly to tell you the difference between "not usable by memory mapped PCI devices", "usable by memory mapped PCI devices" and "used by memory mapped PCI devices".
parsing (or configuring) PCI configuration space (in conjunction with IOMMUs if necessary) tells you which areas of the physical address space are currently used by which PCI devices
parsing "System Management BIOS" tables can (with a lot of work and "heuristical fumbling") tell you which areas of the physical address space correspond to which RAM chips on the motherboard and what the details of those RAM chips are (type, speed, etc).
various ACPI tables (e.g. MADT/APIC and HPET) can be used to determine the location of various special devices (local APICs, IO APICs, HPET).
you can assume that (part of) the area ending at physical address 0xFFFFFFFF will be the firmware's ROM; and (with some more "heuristical fumbling" to subtract any special devices from the area reported as "reserved" by the firmware's memory map) you can determine the size of this area.
If you do all of this you'll have a reasonably complete map describing everything in the physical address space.

How to draw a pixel on the screen in protected mode in x86 assembly?

I am creating a little bootloader+kernel and till now I managed to read disk, load second sector, load GDT, open A20 and enable pmode.
I jumped to the 32-bits function that show me a character on the screen, using the video memory for textual content (0x000B0000 - 0x000B7777)
pusha
mov edi, 0xB8000
mov bl, '.'
mov dl, bl
mov dh, 63
mov word [edi], dx
popa
Now, I would like to go a little further and draw a single pixel on the screen. As I read on some website, if I want to use the graphics mode of the VGA, I have to write my pixel at location 0x000A0000. Is that right?
Now, what is the format of a single pixel? For a single character you need ASCII code and attribute, but what do you need to define a pixel (if it works the same way as the textual mode)?
Unfortunately, it's a little more than a little further.
The rules for writing to video memory depend on the graphics mode. Among traditional video modes, VGA mode 320x200 (8bpp) is the only one where video memory behaves like a normal kind of memory: you write a byte corresponding to a pixel you want to the video buffer starting from 0xA000:0000 (or 0xA0000 linear), and that's all.
For other VGA (pre-SVGA) modes, the rules are more complicated: when you write a byte to video memory, you address a group of pixels, and some VGA registers which I have long since forgotten specify which planes of those pixels are updated and how the old value of them is used. It's not just memory any more.
There are SVGA modes (starting with 800x600x8bpp); you can switch to them in a hardware-independent way using VESA Video Bios Extensions. In those modes, video memory behaves like memory again, with 1,2,3 or 4 bytes per pixel and no VGA-like 8-pixel groups which you touch with one byte access. The problem is that the real-mode video buffer is not large enough any more to address the whole screen.
VESA VBE 1.2 addressed this problem by providing functions to modify the memory window base: in any particular moment, the segment at linear 0xA0000 is addressing 64Kb region of video memory, but you can control which 64Kb of the whole framebuffer are available at this address (minimal unit of base address adjustment, a.k.a window granularity, depends on the hardware, but you can rely on the ability to map N*64Kb offset at 0xA0000). The downside is that it requires VBE BIOS call each time when you start working with different 64Kb chunk.
VESA VBE 2.0 added flat framebuffer, available at some high address in protected mode (also in unreal mode). Thus VBE BIOS call is required for entering video mode, but not for drawing pixels.
VESA VBE 3.0, which might not be portable enough yet, provides a way to call VBE functions in protected mode. (I didn't have a chance to try it, it was not there during my "OS in assembly" age).
Anyway, you have to switch to graphics mode first. There are several variants of doing that:
The easiest thing to do is to use a BIOS call before you enter protected mode. With VBE 2.0, you won't need video memory window adjustment calls.
Another way is creating a V8086-mode environment which is good enough for BIOS. The hardest part is forwarding interrupts to real-mode interrupt handlers. It's not easy, but when it's done, you'll be able to switch video modes in PM and use some other BIOS functions (for disk I/O, for example).
Yet another way is to use VESA VBE 3.0 protected mode interface. No idea on how easy or complicated it might be.
And a real Jedi way is digging out the information on your specific video card, switching modes by setting its registers. Been there, done that for some Cirrus card in the past -- getting big plain framebuffer in PM was not too complicated. It's unportable, but maybe it's just what you need if the aim is understanding the internals of your machine.
It depends on the graphics mode in use, and there are a lot a differences. BIOS VGA video mode 13h (320x200 at 8 bits/pixel) is probably the easiest to get started with (and it's the only BIOS VGA video mode with 256 colors, however you can create your own modes by writing directly to the ports of the video card): in BIOS video mode 13h the video memory mapped to screen begins at 0x0A0000 and it runs continuosly 1 byte for each pixel, and only 1 bit plane, so each coordinate's memory address is 0x0A000 + 320*y + x:
To change to BIOS video mode 13h (320 x 200 at 8 bits/pixel) while in real mode:
mov ax,0x13
int 0x10
To draw a pixel in the upper left corner (in video mode 13h) while in protected mode:
mov edi,0x0A0000
mov al,0x0F ; the color of the pixel
mov [edi],al
org 100h
bits 16
cpu 386
section.text:
START:
mov ax,12h
int 10h
mov al,02h
mov ah,0ch
pixel.asm
c:\>nasm pixel.asm -f bin -o pixel.com
int 10h

Memory access after ioremap very slow

I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.
I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:
#define RESERVED_REGION_SIZE (1 * 1024 * 1024 * 1024) // 1GB
#define RESERVED_REGION_OFFSET (1 * 1024 * 1024 * 1024) // 1GB
static int __init memdrv_init(void)
{
struct timeval t1, t2;
printk(KERN_INFO "[memdriver] init\n");
// Remap reserved physical memory (that we grabbed at boot time)
do_gettimeofday( &t1 );
reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] ioremap() took %d usec\n", usec_diff( &t2, &t1 ) );
// Set the memory to a known value
do_gettimeofday( &t1 );
memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] memset() took %d usec\n", usec_diff( &t2, &t1 ) );
// Register the character device
...
return 0;
}
I load the driver, and check dmesg. It reports:
[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec
That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?
This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.
EDIT:
The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by both hardware and software. I can probably make do with a smaller buffer also.
See my eventual solution below.
ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.
You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.
I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.
It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.
Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.
I have tried out doing a huge memory chunk reservations with the memmap
The ioremapping of this chunk gave me a mapped memory address space which in beyond few tera bytes.
when you ask to reserve 128GB memory starting at 64 GB. you see the following in /proc/vmallocinfo
0xffffc9001f3a8000-0xffffc9201f3a9000 137438957568 0xffffffffa00831c9 phys=1000000000 ioremap
Thus the address space starts at 0xffffc9001f3a8000 (which is waay too large).
Secondly, Your observation is correct. even the memset_io results in a extremely large delays (in tens of minutes) to touch all this memory.
So, the time taken has to do mainly with address space conversion and non cacheable page loading.

Resources