JTAG Programming Xilinx Spartan 3an FPGA with CPU in chain - xilinx

I have problems with programming Xilinx 3an1400 FPGA over JTAG interface.
My custom PCB consists of CPU and FPGA connected in the JTAG chain with CPU on 1-st position and FPGA on second. I can access and program the CPU over JTAG without any problem. When FPGA is alone in the JTAG chain its programming is also successful.
The problems exist only when the FPGA and the CPU are in one chain. Identification the number of the devices in the chain looks OK. Two devices are recognized by the Xilinx Impact tool, but I get the error when trying to perform the read ID command on the FPGA.
I have tried to lower the JTAG frequency to 750Khz with no success. I Got the message :
INFO:iMPACT:583 - '2': The idcode read from the device does not match the idcode in the bsdl File.
INFO:iMPACT:1578 - '2': Device IDCODE : 00001111111111111111111111111110
INFO:iMPACT:1579 - '2': Expected IDCODE: 00000010011000110000000010010011
I connect the logic analyzes and get the signal dump.
When FPGA only is connected and everything works as expected I get the next picture from read ID command:
Logic analyzer picture dump FPGA only in the chain
When I have TI AM5726 CPU and FPGA in one chain I see the next picture by logging the read ID command:
Picture of the JTAG dump with CPU and FPGA connected in the chain
CPU should be bypassed, as 6 ones (length of CPU IR register) has been send, and read ID command looks the same as for the previous example (001001), but response for this command is not as expected.
Probably CPU is not in the bypass and somehow break the TDO signal, but from the sending pattern, it should be.
I have also checked the form of the signal with the oscilloscope and it looks OK.
What can be the potential reason of the problem ?
When I run the "Initialize chain" command on the start of the Impact tool, it detects that second device is Spartan 3AN1400, so I suppose that it must somehow performs read id operation during initialization.

Related

Memory transfer between two device in OpenCL

I want to develop an application with OpenCL to run on multiGPU. At some point, data from one GPU should be transferred to another one. Is there any way to avoid transferring through host. This can be done on CUDA via cudaMemcpyPeerAsync function. Is there any function similar to it in OpenCL?
In OpenCL, a context is treated as a memory space. So if you have multiple devices associated with the same context, and you create a command queue per device, you can potentially access the same buffer object from multiple devices.
When you access a memory object from a specific device, the memory object first needs to be migrated to the device so it can physically access it. Migration can be done explicitly using clEnqueueMigrateMemObjects.
So a sequence of a simple producer-consumer with multiple devices can be implemented like so:
command queue on device 1:
migrate memory buffer1
enqueue kernels that process this buffer
save last event associated with buffer1 processing
command queue on device 2:
migrate memory buffer1 - use the event produced by queue 1 to sync the migration.
enqueue kernels that process this buffer
How exactly migration occurs under the hood I cannot tell, but I assume that it can either be DMA from device 1 to device 2 or (more likely) DMA from device 1 to host and then host to device 2.
If you wish to avoid the limitation of using a single context or would like to insure the data transfer is efficient, then you are at the mercy of vendor-specific extensions.
For example, AMD offers DirectGMA technology that allows explicit remote DMA between GPU and any other PCIe device (including other GPUs). From experience it works very nice.

ESP8266 programming without SDK

There are limitations in the ESP SDK libraries (which are not public) like for example the length of the packet recv (112bytes max) when in promisc mode.
I tried reaching them to get some input and directions - but they seem to be replying nonsense.
Is it possible to program the chip without the SDK - thus make my own SDK and forget their limitations?
The processor-core on the esp8266 is an 'xtensa'. The processor-core, or let's just call it the processor, is what we program with C or C++ or assembler. The processor's instruction set is made public by the company (which is Tensilica .. or Cadence??) and once you have the instruction set, you can program directly or make a compiler and have complete freedom with the processor.
The processor-core is not the complete product and for us end-consumers, and companies, like Espressif, buy the Intellectual Property rights to a processor-core's design and build an end-product by putting peripherals like SPI, I2C, UART and in the esp8266's case, the wifi-tranceiver, around the processor-core.
These peripherals are controlled digitally, and output to the processor digitally, so the processor can interface with them - but their action can be either digital or analog. For UART, SPI, I2C etc, espressif has provided us with the datasheet that informs of all the registers that can be used to control that peripheral. It's something like write to this X memory address what you want to transfer and then set the bit Y on the Z memory address to begin the transfer. For SPI for example, there are registers to control speed, polarity, phase etc for a transfer. Once you know how to control a peripheral at the lower level, you can write high level drivers, which espressif does provide too, but you can write your own.
For Wifi, espressif hasn't released how the peripheral can be interfaced with, so we have to depend upon the compiled binaries that espressif sends us. You can use 'objdump -t' on the 'lib/lib80211.a' to get atleast the names of the routines that the Wifi driver provides. You can call these routines from C or assembler code and go a little bit lower than espressif intended but to go any lower would require 'Reverse Engineering' by manually understanding the low level code in the compiled Driver and nobody's gonna take that risk and time-drain.

x86 protected mode memory management

I'm newibe of x86 cpu.
I read all materials about memory management of protected mode in x86.
the materials are Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A, System Programming Guide, Part 1
I believe I understand the many steps when cpu is accessing memory.
: selector register is index of segment descriptor table, and the entry of descriptor table is base of the segment, and linear address is addition of the base of the segment and 32bit offset.
But, what I'am confusing about is, it seems to me that CPU cannot know which memory address it will be access at the first time until the all steps above is finished. If CPU want to access specific memory address, It must know the selector value, and offset. But my question is how does it know ?? only information does CPU know is memory address it want to access doesn't it??
How does CPU know the input(selector value, offset) already when it only knows the output(memory address)??
... by
Microprocessor Real Time Clocks or Timer Chips,
periodic function called 'clock signal'
by Memory Controller Hub
Advanced Configuration and Power Interface (ACPI)
ROM, a non-volatile memory inside chips (RealMode Memory Map)
The Local Descriptor Table (LDT) is a memory table used in the x86 architecture in protected mode and containing memory segment descriptors: start in linear memory, size, executability, writability, access privilege, actual presence in memory, etc.
Interrupt descriptor table, is a data structure used by the x86 architecture to implement an interrupt vector table. The IDT is used by the processor to determine the correct response to interrupts and exceptions.
Intel 8259 is a Programmable Interrupt Controller (PIC) designed for the Intel 8085 and Intel 8086 microprocessors. The initial part was 8259, a later A suffix version was upward compatible and usable with the 8086 or 8088 processor. The 8259 combines multiple interrupt input sources into a single interrupt output to the host microprocessor, extending the interrupt levels available in a system beyond the one or two levels found on the processor chip
You also missing real mode
look also DOS_Protected_Mode_Interface & Virtual Control Program Interface
How timer chip control reset line of CPU ?
See also OSCILLATOR CIRCUIT WITH SIGNAL BUFFERING AND START-UP CIRCUITRYfrom Google Patents
real time clock
The CPU 'start' executing code stored in ROM on the motherboard at address FFFF0
The routine test the central hardware, search for video ROM
...
So.. is it not the CPU that 'start' because is power supply line that 'starts'
The power supply signal is sent to the motherboard, where it is received by the processor timer chip that controls the reset line to the processor.
How does the BIOS detect RAM ? See also serial presence detect, power-on self-test (POST)
BIOS is a 16-bit program running in real mode
The BIOS begins its POST when the CPU is reset. The first memory location the CPU tries to execute is known as the reset vector. In the case of a hard reboot, the northbridge will direct this code fetch (request) to the BIOS located on the system flash memory. For a warm boot, the BIOS will be located in the proper place in RAM and the northbridge will direct the reset vector call to the RAM
What is this reset vector ?
The reset vector is the default location a central processing unit will go to find the first instruction it will execute after a reset.
The reset vector is a pointer or address, where the CPU should always begin as soon as it is able to execute instructions. The address is in a section of non-volatile memory initialized to contain instructions to start the operation of the CPU, as the first step in the process of booting the system containing the CPU.
The reset vector for the 8086 processor is at physical address FFFF0h (16 bytes below 1 MB). The value of the CS register at reset is FFFFh and the value of the IP register at reset is 0000h to form the segmented address FFFFh:0000h, which maps to physical address FFFF0h.
About northbridge
A northbridge or host bridge is one of the two chips in the core logic chipset architecture on a PC motherboard, the other being the southbridge. Unlike the southbridge, northbridge is connected directly to the CPU via the front-side bus (FSB)
Sources:
"80386 Programmer's Reference Manual" (PDF). Intel. 1990. Section 10.1 Processor State After Reset
"80386 Programmer's Reference Manual" (PDF). Intel. 1990. Section 10.2.3 First Instruction,

Save the outputs of motes in mote memory

I am using two motes. one has unicast sender program on it and one has uni-cast receiver program on it. Instead of connecting receiving mote with PC, I want to use batteries for mote power source and I want to save outputs of both motes on its mote memory. How can I save output(printf command outputs) of each mote in mote memory and retrieve later on after completion of experiments. Is there any method(built-in functions, commands or code snippet) available for this
P.S. I am using zolertia z1 motes
The straightforward way is to use the xmem interface. The function prototypes are declared in file xmem.h:
https://github.com/contiki-os/contiki/blob/master/core/dev/xmem.h
For Z1 there's a platform-specific implementation of xmem in platform's directory.
If you have never worked with flash memory before, note that the "rewrite" operation is typically not supported by the hardware. You need to erase a whole sector of the flash before you can write anything in that sector. Therefore, the typical usage pattern for dumping sensor data or logs is "write only at the end, never modify". When the current sector is full, erase the next one and write there, and so until the whole flash is full.
Contiki also the Coffee filesystem, which a higher level interface if you need one.

How long does it take to set up an I/O controller on PCIe bus

Say I have an InfiniBand or similar PCIe device and a fast Intel Core CPU and I want to send e.g. 8 bytes of user data over the IB link. Say also that there is no device driver or other kernel: we're keeping this simple and just writing directly to the hardware. Finally, say that the IB hardware has previously been configured properly for the context, so it's just waiting for something to do.
Q: How many CPU cycles will it take the local CPU to tell the hardware where the data is and that it should start sending it?
More info: I want to get an estimate of the cost of using PCIe communication services compared to CPU-local services (e.g. using a coprocessor). What I am expecting is that there will be a number of writes to registers on the PCIe bus, for example setting up an address and length of a packet, and possibly some reads and writes of status and/or control registers. I expect each of these will take several hundred CPU cycles each, so I would expect the overall setup would take order of 1000 to 2000 CPU cycles. Would I be right?
I am just looking for a ballpark answer...
Your ballpark number is correct.
If you want to send an 8 byte payload using an RDMA write, first you will write the request descriptor to the NIC using Programmed IO, and then the NIC will fetch the payload using a PCIe DMA read. I'd expect both the PIO and the DMA read to take between 200-500 nanoseconds, although the PIO should be faster.
You can get rid of the DMA read and save some latency by putting the payload inside the request descriptor.

Resources