FreeRTOS stuck in vListInsert - freertos

I'm using FreeRTOS 10.0.1 and have a really hard problem, trying to solve it for days, getting my code to run on a CC1310 (Arm Cortex M3).
I use the TI SDK and read data from a I2C device, first time is successful, second gets stuck in the vListInsert, with the pxIterator->pxNext points to itself, so the for loop is infinite.
The driver is waiting for a SemaphoreP_pend(), if I set a breakpoint, I can see that the post gets called, but the kernel is just stuck.
I have set the SysTick and PendSV isr prio to 7 (lowest).
The i2c interrupt is prio 6.
configMAX_SYSCALL_INTERRUPT_PRIORITY is set to 1.
There is no stack overflow as far as I can tell.
Please help, how do I debug this problem ?
Best regards
Jakob

This is almost certainly a problem with interrupt priorities and the list getting corrupted. The interrupt priority is stored in the top 3 bits in your case (as there are 3 priority bits). So 7 is stored as 7 << 5 (11100000b) (you can pad the lower bits with 1 if you like so priority 7 == 255). This is handled by FreeRTOS.
What I suspect is happening is your I2C interrupt of priority 6, is not being << 5 so you have 00000110b which gives a priority of 0 (highest, as its the top 3 bits)

I solved the Issue, after getting help from #realtime-rik, I decided to check all my interrupt priorities again. They where all ok, but in the process I discovered two things.
The TI-SDK had structs with buffers in some of the drivers, which where rtos dependent, so their size should be set manually for each driver depending on the rtos usage.
I called the board init function in main before the scheduler was started, and inside board init, one of the drivers was using FreeRTOS queues. I have moved board init to my thread now.

I ran into an issue with FreeRTOS getting stuck in vListInsert() when I accidentally disabled interrupts and tried to disable interrupts again. Make sure you don't have a call to taskENTER_CRITICAL() followed by portDISABLE_INTERRUPTS().

Related

Alternatives to D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS?

This is a followon to this question about using the DX11VideoRenderer sample (a replacement for EVR that uses DirectX11 instead of EVR's DirectX9).
I've been trying to track down why it uses so much more CPU than the EVR. Task Manager shows me that most of that time is kernel mode.
Using profiling tools, I see that a LOT of time is being spent in numerous calls to NtDelayExecution (aka Sleep). How many calls? ~100,000 over the course of ~12 seconds. Ok, yeah, I'm sending a lot of frames in those 12 seconds, but that's still a lot of calls, every one of which requires a kernel mode transition.
The callstack shows the last call in "my" code is to IDXGISwapChain1::Present(0, 0). The actual call seems be Sleep(0) and comes from nvwgf2umx.dll (which is why this question is tagged NVidia: hopefully someone there can call up the code and see what the logic is behind such frequent calls).
I couldn't quite figure out why it would need to do /any/ Sleeping during Present. It's not like we wait for vertical retrace anymore, is it? But the other reason to use Sleep has to do with yielding to other threads. Which led me to a serious clue:
If I use D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS, the CPU utilization drops. Along with some other fixes, the DX11 version is now faster and uses less CPU time than the DX9 version (which is what I would hope/expect). Profiling shows that Sleep has dropped from >30% to <1%.
Unfortunately, this page tells me:
This flag is not recommended for general use.
Oh.
So, any ideas on how to get decent performance without using debug flags?

How can I investigate failing calibration on Spartan 6 MIG DDR

I’m having problems with a Spartan 6 (XC6SLX16-2CSG225I) and DDR (IS43R86400D) memory interface on some custom hardware. I've tried on a SP601 dev board and all works as expected.
Using the example project, when I enable soft_calibration, it never completes and calib_done stays low.
If I disable calibration I can write to the memory perfectly as far as I can see. But when I try to read from it, I get a variable number of successful read commands before the Xilinx memory controller stops implementing the commands. Once this happens, the command fifo fills up and stays full. The number of successful commands varies from 8 to 300.
I'm fairly convinced it's a timing issue, probably related to DQS centering. But because I can't get calibration to complete when enabled, I don't have continuous DQS Tuning. So I'm assuming it works with calibration disabled until the timing drifts.
Is there any obvious places I should be looking for why calibration fails?
I know this isn't a typical stack overflow question, so if it's an inappropriate place then I'll withdraw.
Thanks
Unfortunately, the calibration process just tries to write and read content successively while adjusting taps internally. It finds one end of success then goes the other direction and identifies that successful tap and then final settles on some where in the middle.
This is probably more HW centric as well, so I post what I think and let someone else move the thread.
Is it just this board? Or is it all of them that are doing it? Have you checked? If it's one board, and the RAM is BGA style, it could be a bad solider job. Push you finger down slightly on the chip and see if you get different results... After this is gets more HW centric
Does the FPGA image you are running on your custom board, have the ability to work on your devkit? A lot of times, that isn't practical I know, but I thought I would ask as it rules out that the image you are using on the devkit has FPGA constraints you aren't getting in your custom image.
Check your length tolerances on the traces. There should have been a length constraint. Plus or minus 50 mils something like that. No one likes to hear they need a board re-spin, but if those are out, it explains a lot.
Signal integrity. Did you get your termination resistors in there and are they the right values? Don't supposed you have an active probe?
Did you get the right DDR memory. Sometimes they use a different speed grade and that can cause all sorts of issue.
Slowing down the interface will usually help items 4 and 5. so if you are just trying to work done, you might ask for a new FPGA image with a slower clock.

Watchdog timeout during call to file.format?

This question is entirely unrelated to my code, but to satisfy the obligatory show your code directive:
file.format()
Before the call above returns, on this one SoC I always get a wdt reset. Sometimes but not always the flash does appear to be formatted when the chip is started again. And sometimes if freezes after the wdt reset message, and has to be powered off (looks like wrong comm parameters after pressing hardware reset, but none of the terminal app options seemed to match.)
(Note: since starting this draft I built another copy of my device, using another new, recently received ESP8266-12E, and it behaves identically. Previously built copies still work normally, with the identical firmware.)
So this must be a bad chip, right? Or maybe bad on-board flash? It is a brand new one I just bought. I've also seen file.write issues, with buffer size always 255 bytes or less, though no read issues at all.
One other quirk, after burning a cloud-built nodemcu image to this ESP8266-12E device, adc.read returned 65535 and adc.readvdd33 returned an apparently valid value. (I corrected that by burning esp_init_data_default.bin to 0x3FC000.) This was the first (out of 15, maybe 20) I have seen that was like that. I did not check to see if an older version of nodemcu was already on it.
This wouldn't be the first chip with which I've had issues on arrival; it's at least the 2nd, likely the 3rd or 4th.
So maybe the larger question, what percentage of the ESP8266's that you buy, are either DOA or suffer infant mortality? (Not counting the ones that you have reason to believe were inadvertently killed.)
The problem can be something other than the ESPs, like a inappropriate power supply. I know from my own experience that the Arduino Uno and most USB-TTL converters cannot safely deliver enough current to ESPs. If you're not already, consider using a dedicated power supply circuit that are connected to a USB power source.
It does indeed seem to be a hardware issue, 2 bad out of 6, not good! I think it might be a certain vendor but don't want to name names without being sure... Whatever is wrong with the chip hangs it up long enough to make the watchdog bark.
Much more than the cost of the part, the time consumed figuring out whether it's lua code, firmware, supporting connections, peripherals or the chip itself, is the costly thing (not to mention frustration, and wasted storage on SO.)

WatchDog Timer in Beaglebone Black

I am using BeagleBone Black in a project and wanted to ask if anyone knows the limits of the internal WDT (WatchDog Timer). What can it and what can it not do? I'm a beginner with BeagleBone and WDT...
Thanks!
Quoting from "AM335x Sitara™ Processors - Technical Reference Manual":
The watchdog timer is an upward counter capable of generating a pulse on the reset pin and an interrupt
to the device system modules following an overflow condition. The watchdog timer serves resets to the
PRCM module and serves watchdog interrupts to the host ARM. The reset of the PRCM module causes a
warm reset of the device.
Substantially the WDT is a clock device, that is a hw. register whose value is automatically increased regularly with accurate frequency. There is also an hardware comparator whose goal is to trigger an IRQ every time the WDT overflows. The difference with a traditional timer is in the default action done on IRQ: in this case (WDT) is to reset the board.
The main goal of the WDT is to react to error situations in which the runtime environment (or the kernel) is freezed and is not responding anymore. When this happens the runtime does not reset the WDT, so it overflows, launch an IRQ and the board is resetted so that the runtime environment can regain control of the board.
To use this feature (you're obliged if you don't want your board to be resetted every x seconds) you'll have to write any value in the WDT_WTGR register (hw. address - 0x44E35030) to cause a time counter reload and avoid reset of the board. I noticed that WDT overflows after approximately 50 seconds on the Beaglebone Black, so you'll have to write a value every x < 50 seconds.
However this is valid if you plan to implement a bare metal application to be loaded on the board. In other words the WDT is correctly handled by UBoot (the BBB default boot loader) or by the Linux kernel so you'll not have to worry about this.
I hope I have taken away your doubts! :-)
Further reading:
http://www.ti.com/lit/ug/spruh73m/spruh73m.pdf - section 20.4

Trouble reading memory

When I run my code through the debugger, after a series of steps it eventually gets lost and executes commands out of order. I'm not sure if the stack is overflowing or what.
This is the error I usually get:
MSP430: Trouble Reading Memory Block at 0xffe2e on Page 0 of Length 0x1d2: Invalid parameter(s)
Any suggestions on what it could be? I read briefly about possible issues with not handling some interrupts.
Also, I'm trying to fill my RAM with a specific value so that I can tell if the stack is overflowing, any suggestions on how to fill the entire RAM with, say a value of 0x1234?
Thanks!
What debugger and compiler are you using? I've found that msp430-gcc and msp430-gdb/gdbproxy can get very confused with GCC optimizations turned on. However, broken code is sometimes is emitted without them turned on (its a quality product, really).
The easiest way to fill memory is to modify you crt0.s startup file and link it yourself. When memory is set to 0, you can change the pattern there.
Which device are you using? On 16-bit devices, 0xffe2e is outside of the address space of the processor, likely an array index or similar which has gone negative.
I have seen this error as well when using code composer studio and TI's USBFET programmer although I have not been able to nail down a single, definite cause.
Assuming you are using CCS, here are some tips:
1) Catch ACCV (UNMI) and VMA (SYSNMI) interrupts and set a break point within the handlers. If one of these trips, examine the stack for clues as to what triggered the interrupt.
2) If you have any interrupt handlers which re-enable interrupts (GIE bit), make sure they are not being retriggered repeatedly.
3) I have seen this error (inexplicably) when stepping through optimized code; so it may help to turn off optimizations.
If you are using code composer studio, as an alternative to initializing your RAM, you can set a breakpoint on stack overflow. Also, with a paused debug session, CCS gives you the option to fill a portion of memory with any value you choose via the "Memory" sub-window.

Resources