Zedboard Transfer Data from SD Card to DDR - zynq

I have a file on an SD Card that I want to transfer to the DDR memory on the Zedboard. I am using a baremetal application to do this. This worked for data less than 2048 bytes but when the data exceeds 2048 bytes, the Zynq processor hangs when it tries to transfer the data.
Function call to read SD Card data and transfer to DDR memory:
FileOpResult = f_read(&fil_obj,(void*)DDRDestAddr, DDRTxSize, *br);
DDRDestAddr is XPAR_PS7_DDR0_S_AXI_BASEADDR (0x00100000)
Zynq Processor hangs when DDRTxSize at 2048 and above
Is there some limit on the amount of data that I can transfer from the SD Card to the DDR memory? Where can I change this? Or is there some fundamental mistake I made?
Update:
Ok turns out my problem is solved if I simply transfer the data to another region of the DDR memory address instead 0x00200000. Not sure why exactly 0x00100000 can't work when it's clearly seen in the xparameters.h that it is the base address of the DDR memory.
Successfully transferred ~13megabytes from SD Card to DDR starting at address 0x00200000.

Ok turns out my problem is solved if I simply transfer the data to another region of the DDR memory address instead 0x00200000. Not sure why exactly 0x00100000 can't work when it's clearly seen in the xparameters.h that it is the base address of the DDR memory.
Successfully transferred ~13megabytes from SD Card to DDR starting at address 0x00200000.

Related

Store data in DDR3 from PL in SoC Zynq 7020

I have a ZTurn-Board with a 7020 processor featuring a total of 1GB of DDR3 memory connected to the PS.
Due to the needs of the project I have to do, from the PL I am going to be reading a total of 4*2584=10336 consecutive 8-bit data and with a very precise timing control ( I get 4 8-bit data at a time every 2MHz).
So I was wondering if it is possible to store all the data that I am generating in the DDR3 memory from the PL until the process is finished and then, once finished, from the PS send it to the PC, either by UART or GBe. And in case of being possible to store all these data, which would be the IP of which I have to look for information?
Would it be possible to store all the data from the PL until the maximum storage of the 1GB DDR3 memory is fully completed?

Memcpy from PCIe memory takes more time than memcpy to PCIe memory

I am trying to do read/write data to/from a Linux PC from/to a PCIe 2.0 (2 lane) device. The memory for reading and writing are at different RAM locations in the PCIe device. Those memories are mapped in Linux PC using ioremap. My use case is to achieve 18MBytes/second read/write throughput which is obviously supported by the PCIe link. The memory at the PCIe device is uncached.
I am able to achieve the write throughput i.e when I write from Linux PC local memory to PCIe device memory using memcpy. The memcpy takes less than 1 ms for 9216 bytes of data in this case. But when I read the ioremapped PCIe memory to Linux local memory, data loss is happening. I profiled the memcpy and it takes more than 1ms, sometimes 2ms for 9216 bytes of data. I don't want to do DMA for this operation.
Any thoughts on what can be the problem in this case? How can I handle this?
That's entirely expected, and there is nothing you can do about that. The CPU can only issue serialized word-sized reads and writes, which have very poor throughput over the PCIe link due to protocol overheads. Every operation has 24 or 28 byte-times worth of overhead associated with it - that's a 12 or 16 byte TLP header plus 12 byte-times of link layer overhead, and the CPU can only operate on 4 or 8 bytes at a time....which is best case 25% efficient (8/(8+24) = 25%) and at worst 12.5% efficient (4/(4+28) = 12.5%).
The protocol overhead is not the only issue, however. Writes in PCIe are posted, so the CPU can simply issue a bunch of back-to-back writes which eventually make their way onto the bus and to the device. On the other hand, when reading, the CPU can only issue a single read operation, wait for it to traverse the bus twice, store the result, issue another read, etc. Since it can only operate on 8 bytes at a time, the performance is horrible due to the relatively high latency over the PCIe bus (can be on the order of microseconds for each transfer).
The solution? Use DMA. PCIe is specifically designed to support efficient DMA operations over the bus as devices can issue much larger read and write operations, minimum up to 128 bytes per operation.

x86 protected mode memory management

I'm newibe of x86 cpu.
I read all materials about memory management of protected mode in x86.
the materials are Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A, System Programming Guide, Part 1
I believe I understand the many steps when cpu is accessing memory.
: selector register is index of segment descriptor table, and the entry of descriptor table is base of the segment, and linear address is addition of the base of the segment and 32bit offset.
But, what I'am confusing about is, it seems to me that CPU cannot know which memory address it will be access at the first time until the all steps above is finished. If CPU want to access specific memory address, It must know the selector value, and offset. But my question is how does it know ?? only information does CPU know is memory address it want to access doesn't it??
How does CPU know the input(selector value, offset) already when it only knows the output(memory address)??
... by
Microprocessor Real Time Clocks or Timer Chips,
periodic function called 'clock signal'
by Memory Controller Hub
Advanced Configuration and Power Interface (ACPI)
ROM, a non-volatile memory inside chips (RealMode Memory Map)
The Local Descriptor Table (LDT) is a memory table used in the x86 architecture in protected mode and containing memory segment descriptors: start in linear memory, size, executability, writability, access privilege, actual presence in memory, etc.
Interrupt descriptor table, is a data structure used by the x86 architecture to implement an interrupt vector table. The IDT is used by the processor to determine the correct response to interrupts and exceptions.
Intel 8259 is a Programmable Interrupt Controller (PIC) designed for the Intel 8085 and Intel 8086 microprocessors. The initial part was 8259, a later A suffix version was upward compatible and usable with the 8086 or 8088 processor. The 8259 combines multiple interrupt input sources into a single interrupt output to the host microprocessor, extending the interrupt levels available in a system beyond the one or two levels found on the processor chip
You also missing real mode
look also DOS_Protected_Mode_Interface & Virtual Control Program Interface
How timer chip control reset line of CPU ?
See also OSCILLATOR CIRCUIT WITH SIGNAL BUFFERING AND START-UP CIRCUITRYfrom Google Patents
real time clock
The CPU 'start' executing code stored in ROM on the motherboard at address FFFF0
The routine test the central hardware, search for video ROM
...
So.. is it not the CPU that 'start' because is power supply line that 'starts'
The power supply signal is sent to the motherboard, where it is received by the processor timer chip that controls the reset line to the processor.
How does the BIOS detect RAM ? See also serial presence detect, power-on self-test (POST)
BIOS is a 16-bit program running in real mode
The BIOS begins its POST when the CPU is reset. The first memory location the CPU tries to execute is known as the reset vector. In the case of a hard reboot, the northbridge will direct this code fetch (request) to the BIOS located on the system flash memory. For a warm boot, the BIOS will be located in the proper place in RAM and the northbridge will direct the reset vector call to the RAM
What is this reset vector ?
The reset vector is the default location a central processing unit will go to find the first instruction it will execute after a reset.
The reset vector is a pointer or address, where the CPU should always begin as soon as it is able to execute instructions. The address is in a section of non-volatile memory initialized to contain instructions to start the operation of the CPU, as the first step in the process of booting the system containing the CPU.
The reset vector for the 8086 processor is at physical address FFFF0h (16 bytes below 1 MB). The value of the CS register at reset is FFFFh and the value of the IP register at reset is 0000h to form the segmented address FFFFh:0000h, which maps to physical address FFFF0h.
About northbridge
A northbridge or host bridge is one of the two chips in the core logic chipset architecture on a PC motherboard, the other being the southbridge. Unlike the southbridge, northbridge is connected directly to the CPU via the front-side bus (FSB)
Sources:
"80386 Programmer's Reference Manual" (PDF). Intel. 1990. Section 10.1 Processor State After Reset
"80386 Programmer's Reference Manual" (PDF). Intel. 1990. Section 10.2.3 First Instruction,

How can a PCIe card dma data into CPU ram?

This is in reference to this answer given to a similar dma/pci question. I gathered from this answer that the PC does not have a dma capable of transferring data to/from a PCI card, and that the PCI card must provide the dma capabilites. I have received similar answers from colleagues saying, "A two-way dma needs to be on the FPGA (referring to the PCI card) to enable burst transfers to/from cpu memory."
My understanding is that when the PC receives a read request, it needs to fulfill the read request by creating a return packet with the data requested. So, if the card requests a page of data (4096 bytes), the PC needs to return a packet with 4096 bytes. How does the card's dma reach across the bus and use it's dma to fill the needed packet as this answer suggests?
I think there might be a misunderstanding here. The card does not "reach across" the bus to use a DMA function in the PC.
The card itself is a bus master. It can directly read and write the entire memory of the PC, just like the CPU can.
From the PC memory system point of view, there is no difference between the card or the main CPU in the PC. Both are bus masters. Both can perform reads and writes to memory.
Bursts of 4096 Bytes are not supported. You will have to split it up into multiple smaller bursts.

Realistic data rate over PCI bus using DMA?

What is the realistic data transfer rate over a 32-bit/33MHz PCI bus? We need to transfer 32K 32-bit samples from a PCI card to an Intel CPU running Windows. I would think the block would transfer in 1msec but it is taking 40msec. The PCI board has a PLX PCI-9056. We are accessing card memory with a virtual address, but our CPU is bricked-out which make me think the data rate is being held up by CPU involvement. If we go to DMA, will we transfer in closer to 1msec? The reason I have my doubts is the PXI SDK User Manual states:
"BAR space memory read/write is generally slow in relative terms. Reads are typically only 2-4MB/s."
You should check if you can enable burst mode and continuous burst, such that multiple DWords can be transmitted without new address cycles. This makes things much faster. The PLX PCI9056 supports this option, but it must be set by SW accordingly.
We have data rates up to 90 MB/s with DMA Master Transfer on our custom designed frame grabber card.

Resources