ESP8266 Constantly Restarting - esp8266

I have been struggling for some time now trying to get my ESP8266 ESP-12 to work. I was able to get it loaded with the NodeMCU software. Now, the board constantly restarts itself. Whether I have a script loaded on it or not, the module seems to continually restart. I am using ESPlorer, and can see it get connection to NodeMCU. Then the board restarts several seconds to several mins later. I have tried various pinout, capacitors, etc. with no luck in solving this problem. I have been searching all over and have had no luck finding a solution. Any help is greatly appreciated. Here is my current pinout:
ESP-12 ----------- TTY 3.3v Serial
================================================
TX ----------------------------- RX
RX ----------------------------- TX
GND, GPIO15 -------------------- GND
VCC, CH_PD, GPIO0, (RST) ------- LD1117v33 voltage regulator +3.3v
GND, GPIO15 -------------------- LD1117v33 voltage regulator GND
Thanks so much in advance for any help!

Assuming the hardware is okay and the right binary is loaded it's almost surly a power problem.
1) Make sure what ever voltage regulator you're using is rated for 200mA or more. In your case the LD1117 can source 800mA so that's good.
2) Make sure you're upstream power supply can source 200mA or more. If you're powering from a USB hub make sure the hub is powered.
3) Make sure you have some large low ESR capacitors across GND and 3.3v. Two capacitors: 10uF and 100uF worked for me (there's nothing magic about these exact values, 10-100uF should work). The ESP8266 can draw huge (relatively) amounts of current for short periods while booting or transmitting. This can cause a bad transient on the power supply, which will cause the system to reboot, which can lead to an infinite reboot cycle.

ESP8266 running lua goes to panic mode if the program loaded on it is has some bug.
Look at your code again. Reflash the firmware and upload code again. Try to upload code bit by bit. So that you know which part is causing the issue.
fix the setup in such way that flashing firmware is super easy. Trust me you will need to reflash it many times if you wanna play with code on it.

I had a NodeMCU dev board which worked fine for some hours, then suddenly restarted and wouldn't stay up. I tried adding power-supply capacitors and using a different power supply, to no avail.
What fixed it for me was resetting the watchdog timer every second:
tmr.alarm(6, 1000, 1, function() tmr.wdclr() end)
The watchdog timer needs to be reset periodically. I don't know how often. My device was resetting after about 35-40 seconds uptime. My code (which ran every 30 seconds from timer) was resetting the watchdog itself. This was not enough, somehow.

Use a pullup resistor on the RST line rather than just connecting it directly to VCC. I used 4.7K, but the actual value is not critical.

Get the serial terminal program named "terminal v1.9b by br#y++". While I wrote this answer I was not able to download. When I find the link I'll add in a comment.
Run the program and set the baud rate to custom and enter the value 74880 or 74400. With this you'll be able to see the fw messages. In this messages there is the reboot reason code. The codes are :
0 -> normal startup by power on
1 -> hardware watch dog reset
2 -> software watch dog reset (From an exception)
3 -> software watch dog reset system_restart (Possibly unfed wd got angry)
4 -> soft restart (Possibly with a restart command)
5 -> wake up from deep-sleep
Looking at the provided code you can decide from what reason the chip is restarting.

If your hardware is good, then the problem should be inside your code.
And sometime your code takes too long to finish, then it will trigger the watchdog to restart.

I suggest that you connect your reset pin to 3.3v via a 10K ohm resistor and to ground via a push button. This way your reset pin is always pulled high to prevent the random resets. I assume that your code has no bugs.

Related

FreeRTos High Frequency ISR

Can anyone help me do a task with high(like 6kHz) execution rate?
Need to do a SPI transmission on this frequency(the task code is already written). I can achieve over 7kHz without any control(just one task with no timing control, running full time), so the time is not a problem.
The problem is that the TICK_RATE has a resolution of ms, which is too high for what I need. Doing some research I found that reducing the time resolution will cause an unwanted overhead.
So, the way would be using an ISR. Is that right? Couldn't find an example of how do that. I have almost null experience in FreeRTos.
Using Toradex FreeRTOS version in Toradex IMX7D.
Thanks in advance.
Are you asking how to do this using FreeRTOS? in which case the FreeRTOS book has examples, as does the website (this is just one way of doing it). However, as you point out yourself, due to the frequency you really need to be doing this in an interrupt - in which case you need to review the hardware manual to see what facilities the hardware has in regards to DMA'ing data to peripherals, etc.
You need to express your task more clear. What MCU? Two side transmittion? Do you have DMA?
You can try to use timer of your MCU to perform timing and in it's ISR run
xSemaphoreGiveFromISR.
In RTOS task put listener xSemaphoreTake( xSemaphore, LONG_TIME ) == pdTRUE
Resolved it based on the solution in examples/imx7_colibri_m4/driver_examples/gpt(Toradex FreeRTOS version).
Just used GPTB derived from ccmRootmuxGptOsc24m clock. This is important because linux kernel was hanging on startup using the default Pfd0 clock.
To get the frequency I needed just divided GPTB frequency by the desired frequency and passed to GPT_SetOutputCompareValue().

MCP2515 Baudrate Issue

I have to fight that problem for a long time now.
There are 2 MCP2515 CAN Interface Chips connected to each other. The one is controlled by Arduino, the other one by STM32 board.
Scheme: (-> := send)
Arduino->MCP2515->MCP2515->STM32
If I set the baudrate on Arduino to 50k and on STM32 to 50k there is no receive interrupt on the second MCP2515.
When I double the baudrate on Arduino to 100k there will be an interrupt and the data is correctly transferred.
The strange thing is: CFG1 CFG2 CFG3 Register Settings are identical on both MCP2515 Chips!
Sure I can double the Frequency all the time but baud's like 31K25 need 62K5 which is not in the library.
Hope someone encountered the same issue or can help out with this.
I also tried this code for Baudrate references:
https://github.com/latonita/arduino-canbus-monitor/blob/master/mcp_can.cpp
by the way: both run on 8MHz Crystal Oscillators
problem solved partially, the double frequency was because Arduino IDE was using headers in the lib directory not the custom directory outside of the folder!
if I go to 10kBaud or below the interrupt now doesn't respond. Is it maybe too low to be handled?

PIC32 becomes unresponsive after a few hours

I have a PIC32MX340F512 board developed by another company for us, The board has a DS1338 RTCC and 24LC32A eeprom, and display unit on an I2C bus, on this bus i included a TSL2561 I2C light sensor, i wrote code in c to poll the light sensor continously , when the light sensor reaches a certain level i save the time and date and light sensor value on SD card. This all works fine but if i leave the system without exposure to light inside tunnel where incident light on one end of the tunnel is ought to be monitored the system becomes unresponsive no matter how much amount of light you apply and then if i switch power off and back on again everything starts to work normal. i am a one man development team and have been trying to find out the problem for months, i activated the watchdog timer to prevent the system from hanging but the problem still persisted. i then decided to find out if the problem is with the sensor by including a push button to activate light measurement but still when 4-5 hours elapse the PIC cant even detect a change in the the input pin. Under the impression that a hardware reset overrides anything going on i included a reset button and it also works ok for the first few hours after that the PIC doesn't seem to be responding to anything including a reset. I was getting convinced that there is nothing wrong with the firmware but also with all this happening the display unit (pic16f1933 and lcd) on the I2C shares power with the main unit and doesn't seem to be affected as it alternates between different messages constantly Does anybody have an idea what could be wrong (hardware/firmware or my sensor). I am using a 24v DC power supply purchased seperately. The PIC seems to go into a deep sleep although i dd not implement any kind of SLEEP mode in my code. Nb We use the same board for many other projects and i haven't come across such a problem . Thanks in advance.
I think you need to (if you haven't already) explore the wonderful world of in-circuit-debugging (such as with the ICD3 or PICkit 2/3). It allows you to run the processor in a special mode that lets you pause execution, see exactly which line of code is being executed, inspect variable values, and step through the code to see which parts are running and not running, or see exactly where execution takes a wrong turn. If the problem takes hours to reproduce, that's okay. You can just leave it overnight running in debug mode and hopefully it will be locked-up or 'sleeping' in the morning. At this point, you will be able to pause the processor and poke around to see if you got caught in some kind of infinite loop or something. This is often the only way to dig inside a running piece of code to see why things aren't working as you expect. But as you say, those bugs that take hours or days to manifest are the trickiest. Good luck!
It sounds like you can break up your design into two main parts, sd card interfacing, reading the rtc and reading the light sensor. If it were me I would upload a version of the code that mimics reading the light sensor but only returns fake data and see if that cures the problem. Additionally do the same with the other two modules separately and see if any of the three versions of your project not show this problem. From there just keep narrowing it down until you find the block of code thats causing problems.
if Two or more versions of your debug code show the same problem then my guess is it has to do with one of the communication protocols. I had a problem with a Pic32 silicon version blocking when using the DMA in conjunction with the SPI peripherals. So I would suggest checking the errata for your chip.
If you still cant find the problem, my only suggestion would be to check for memory leaks or arrays that are growing into reserved memory.
Hope that helps, good luck!

What happens if a bus-off error occurs in a CAN controller while a car is in motion?

I know that in a CAN controller if the error count reaches some threshold (say 255), bus off will occur which means that a particular CAN node will get switched off from the CAN network. So there won't be any communication at all. But what if the above said scenario happens while the car is moving which contains the ECU (includes the CAN controller)?
Is there any auto-recovery mechanism in a CAN controller to avoid any of the above situations?
During bus off, the node will be isolated.
CAN waits for the mandatory time period, 128 x 11 bits (1408 bits - 5.6 ms for a 250 kbit/s system) of time, and then tries to re-initialize the node.
Yes, if a CAN Tx error count reaches 255, a node will turn off and potentially reset itself. A good implementation will not continue resetting a node if the problem persists.
In addition to this safety mechanism, ECU's (electric control units) also time the duration between valid transmissions of the messages they expect to receive. Therefore, if the engine controller goes offline, nearly every ECU in the vehicle will report "Lost Communication with the Engine Controller."
Typically, these type of CAN problems are identified by DTC's (diagnostic trouble codes) beginning with U, like this one: http://www.obd-codes.com/u0115
Depending on the severity of the issue, the vehicle might enter a "limp home" mode, or might be totally disabled. Problems with the CAN bus on a vehicle are extremely rare, unless there has been some tampering.
The recovery mechanism depends on the software stack that's being used. Most new vehicles have AUTOSAR compliant software implementations. In the AUTOSAR communication stack, the CanSM (state manager) module has configurable BusOff Monitoring and Recovery. You can read more at http://autosar.org .
A BusOff however, is a serious situation in a running vehicle. How this is handled at the vehicle level is very specific to the system design. But, in most cases the system would go into a safe mode of operation and all parameters would take pre-set fault values to let the vehicle run with a reduced functionality. You would see the warning lamps on the dash go off to alert the driver. ECUs typically comply with some level of ASIL (https://en.wikipedia.org/wiki/Automotive_Safety_Integrity_Level) standard. This makes sure that such situations are thought of and taken care of during design and development.
Nothing spectacular will happen, even if the Engine Control Unit looses CAN communication. The car will continue running.
When bus-off occurs, the CAN network isolates that node and then resets that node which can able to start communication.
As you mentioned, after reaching a specific error count, that node gets disconnected/prohibited from transmitting anything on the bus. This is a description for the bus side.
On the controller side, every CAN controller generates an interrupt on BUS_OFF. It is the controller's responsibility that it should reset the CAN controller and bring it back to the normal state.
This is strictly followed for every CAN controller in any car. And this all happens in a few milliseconds... So for the driver, nothing happens!
When the ECU detects a BUS_OFF fault, the ECU should stop its emissions so this is a good question to ask.
There is an auto-recovery mechanism:
For the first three detections, the CAN controller resets its registers without a delay
For the next detections, the ECU waits 1 second before the reset
There is something called limp-home mode for the cars. That is the condition when all the ECUs fail in the car network. Then a set of default parameters for the ECUs are initialized and then the system, i.e., your car can continue running only for some time before it is properly serviced by the OEM.
I know this is an old thread, but the answers are a bit different from the situation I have observed, in relation to the OP question.
From experience, I'm have an issue where my ECU stops communicating with the diagnostic tools while the engine is running, apparantly it has entered the CAN off state. The only reason I know is I have a OBD 2 plug in monitor for engine parameters. I don't get ANY DTC, well most of the time anyways.. sometimes I get DTCs that are not applicable to my vehcile, and some U codes.
That said, the vehicle continues to run just fine, and if I didn't have the plug-in monitor, I would have no idea there was a problem! I'm now pretty sure the ECU for the Engine is having communication problems, and hitting the error counter and shutting off, it's the only thing that makes sense. I checked the CAN signals with a 2 channel O-scope, and they are a bit noisy compared to one of my other cars, so my next step is to swap the ECU and see if that fixes it. I already swapped out the TIPM (Total Integrated Power Module), it serves as a router of sorts between the 2 CAN networks, to the OBD2 port. That apparantly wasn't it.
if a CAN Tx or RX error counter reaches 255 , the node will turn off and be isolated
What happens if a bus-off error occurs in a CAN controller while a car is in motion?
1)HARD SWAPPING can be done in can network.
eg: Assume four(4) nodes(ECUS) are connected in can bus network.if we disconnected one
ecus then also can bus works properly.
2)In BUSS OFF condition it can hear every signal on the bus network but it cant transmit
mssgs(signal). If the car in motion or in rest position.
eg: Ecus(ABS) are using for better performance but actual work is done by actuator(DISK BRAKE).

How to detect disassociation by AP reboot within station in PS mode

I'm writing a fairly low-level driver for a wireless card, and while most of the spec is fairly straightforward, I haven't wrapped my head around a single question yet:
If my station is in power-save mode and its receiver is turned off for an extended period (say, 10 seconds) between DTIM frames, and the access point is rebooted in the meantime so my association is lost, how can I detect this?
I'm aware that the most common case will be that synchronisation is lost thoroughly enough that I will miss a number of beacons and simply go back to the AP search afterwards, but if by some lucky chance I get to see beacons, is there some way to find out that this is a new "instance" of the same AP?
I can think of
a short(er) TIM field -- however I believe APs are allowed to shorten the TIM information if no traffic is waiting
the AP timestamp changing unexpectedly.
the "number of beacons to next DTIM" field changing unexpectedly.
Being a perfectionist, I'd like to know if there is an entirely reliable way to detect that the AP has been rebooted, rather than just putting together clues.
I would suggest that you look at the TSF in received beacon frames and
if it differs too much from the TSF you expected you send a NULL-data
frame to the AP. If the AP was rebooted it should respond with a
deauthenticate frame with reason "Class 2 frame received from
nonauthenticated STA".
I don't have any knowledge of wireless cards at that level, but I'd take a practical route and analyze the communication from the AP just leading up to the disconnect for a pattern that matches a typical shutdown sequence; for example, "no more traffic, a sudden loss of DTIM sync, and then an AP announcement".
Off the top of my head: maybe look into Kismet's AP detection and analysis code for an idea or two. I'd bet someone else has encountered this problem before.
Cheers!

Resources