STM32F4 - Configurate extern SRAM probably - memory

I have got an external SRAM on my STM32F43XX and I am able to use it. I can access the memory regions and test them (memtest).
However, I do not know if my FMC configurations are correct. It is hard for me to understand the relation between the datasheet of my SRAM and the STM32F4 FMC interface.
I use the STM32F4XX reference manual with the SRAM CY7C1051DV33.
Lets start with the timing (Reference page 1591, Table 256 | SRAM Datasheet Page 6):
Address Setup <------- Address Setup to Write End?
Address Hold <------- Data Hold from Address Change?
Data Setup <------- Data Setup to Write End?
Bus Turn <-------- ?
Clock divide ratio <-------- ?
Data latency <----------- ?
AccessMode <------------- ?
The frequency? of the SRAM is defined by my HCLK divided by the clock divide ratio?
So if my HCLK is 100 MHz and the clock divide is 2 I get 50 Mhz (20 ns). So my STM32F4 latency is always bigger than the latency of the SRAM (max 10 ns). So ever where the lowest allowed value would be okay?
Thank you in advance for your help!
My NORRAM INIT looks by the way like this:
init.DataAddressMux = FMC_DATA_ADDRESS_MUX_DISABLE;
init.MemoryType = FMC_MEMORY_TYPE_SRAM;
init.MemoryDataWidth = FMC_NORSRAM_MEM_BUS_WIDTH_16;
init.BurstAccessMode = FMC_BURST_ACCESS_MODE_DISABLE;
init.WaitSignalPolarity = FMC_WAIT_SIGNAL_POLARITY_LOW;
init.WrapMode = FMC_WRAP_MODE_DISABLE;
init.WaitSignalActive = FMC_WAIT_TIMING_BEFORE_WS;
init.WriteOperation = FMC_WRITE_OPERATION_ENABLE;
init.WaitSignal = FMC_WAIT_SIGNAL_DISABLE;
init.ExtendedMode = FMC_EXTENDED_MODE_DISABLE;
init.AsynchronousWait = FMC_ASYNCHRONOUS_WAIT_DISABLE;
init.WriteBurst = FMC_WRITE_BURST_DISABLE;
init.ContinuousClock = FMC_CONTINUOUS_CLOCK_SYNC_ASYNC;

address setup is on the address bus. how much time before the clock does the ram show that the address has settled (no longer changes) and/or from the prior clock. hold is how long after the clock does it stay stable.
data setup is how long before the clock is the data stable.
the ram and the microcontroller datasheets should have timing diagrams. for the clock speed you have chosen, do you meet timing and/or do you have to set some parameters to meet timing.

Related

maximum memory range addressing confusion

I am confused on some memory addressing and location.
I have a board with an EEPROM 16KB
Correct me if I am wrong, but
16KB = 16 * 1000 * 8 = 128Kb
So to write at the maximum address (at 16000 decimal)
I write at address 0x3E80
Now why on my board, I can write and then read at addresses above 0x3E80
For instance I write at 0xFFFF and read the value back
The device is M24128-BW M24128-BR
Thanks

Do memory channels use separate pins for data bus in Intel i7-4600U?

i7-4600U datasheet says that SA_DQ[63:0] is used for memory channel A. And SB_DQ[63:0] is used for memory channel B. So my understanding is that memory channel A and memory channel B use different processor pins for each own's data bus.
Is my understanding correct?
The presence of the of SA_DQ[63:0] and SB_DQ[63:0] pretty much says it all.
The are two physical channels.
If you still need a secondary "prof", you can also check the math of this statement in the datasheet
Theoretical maximum memory bandwidth of:
— 21.3 GB/s in dual-channel mode assuming 1333 MT/s
— 25.6 GB/s in dual-channel mode assuming 1600 MT/s
With 1333MT/s1 and 8 Bytes per transfer we got 1333 * 8 = 10.664 GB/s.
In order to have a 21.3 GB/s peak, two channels must be used simultaneously, thus they must have separate pins.
1 Note that Mega-transfers per second is given by the internally multiplied bus clock (multiplier is fixed to four) times two, due to the double data rate. So 1333MT/s uses a 1333/2 = 666.67 MHz internal clock that corresponds to a 666.67/4 = 133.37 MHz bus clock.

Calclulate Power consumption when CPU don't change to LPM Mode in Contiki

I need to calculate power consumption of CPU. According to this formula.
Power(mW) = cpu * 1.8 / time.
Where time is the sum of cpu + lpm.
I need to measure at the start of certain process and at the end, however the time passed it is to short, and cpu don't change to lpm mode as seen in the next values taken with powertrace_print().
all_cpu all_lpm all_transmit all_listen
116443 1514881 148 1531616
17268 1514881 148 1532440
Calculating power consumption of cpu I got 1.8 mW (which is exactly the value of current draw of CPU in active mode).
My question is, how calculate power consumption in this case?
If MCU does not go into a LPM, then it spends all the time in active mode, so the result of 1.8 mW you get looks correct.
Perhaps you want to ask something different? If you want to measure the time required to execute a specific block of code, you can add RTIMER_NOW() calls at the start and end of the block.
The time resolution of RTIMER_NOW may be too coarse for short-time operations. You can use a higher frequency timer for that, depending on your platform, e.g. read the TBR register for timing if you're compiling for a msp430 based sensor node.

How to calculate the virtual memory of a computer system?

For example the following details are given regarding a computer system,
word length = 64 bits
internal registers = 32
address bus width = 60 bits
main memory = 1GB
with the help of above details how to calculate system's virtual memory?
Also is that possible to calculate
total memory occupied by the CPU internal register set
maximum operands that can be handle
using above data?

Memory access after ioremap very slow

I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.
I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:
#define RESERVED_REGION_SIZE (1 * 1024 * 1024 * 1024) // 1GB
#define RESERVED_REGION_OFFSET (1 * 1024 * 1024 * 1024) // 1GB
static int __init memdrv_init(void)
{
struct timeval t1, t2;
printk(KERN_INFO "[memdriver] init\n");
// Remap reserved physical memory (that we grabbed at boot time)
do_gettimeofday( &t1 );
reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] ioremap() took %d usec\n", usec_diff( &t2, &t1 ) );
// Set the memory to a known value
do_gettimeofday( &t1 );
memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] memset() took %d usec\n", usec_diff( &t2, &t1 ) );
// Register the character device
...
return 0;
}
I load the driver, and check dmesg. It reports:
[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec
That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?
This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.
EDIT:
The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by both hardware and software. I can probably make do with a smaller buffer also.
See my eventual solution below.
ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.
You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.
I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.
It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.
Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.
I have tried out doing a huge memory chunk reservations with the memmap
The ioremapping of this chunk gave me a mapped memory address space which in beyond few tera bytes.
when you ask to reserve 128GB memory starting at 64 GB. you see the following in /proc/vmallocinfo
0xffffc9001f3a8000-0xffffc9201f3a9000 137438957568 0xffffffffa00831c9 phys=1000000000 ioremap
Thus the address space starts at 0xffffc9001f3a8000 (which is waay too large).
Secondly, Your observation is correct. even the memset_io results in a extremely large delays (in tens of minutes) to touch all this memory.
So, the time taken has to do mainly with address space conversion and non cacheable page loading.

Resources