Panther boot hang or container start up not complete, waiting indefinitly in state: battery or light

Hi there,

I managed to flash an SD card (via Rufus) with the husarion-ugv-rpi-ubuntu-24.04-v2.2.1b.img to replace the existing one in Panther. As indicated in the user guide, I had to remove the single battery which was then re installed correctly after the rear service compartment was closed after the SD card swap on the Raspberry pi.

I followed the steps of the documentation to install the card and then to set up the software side on the built-in computer: I did the first_boot_install.sh which implemented the docker for the different parts including the webui snap. To get this one to work, I had to restore the RTUX11 router as indicated in the documentation as well and had to reset manually the user computer static IP.

The install and implementation seemed successful and I managed to do a couple of shutdown and boot up without errors while setting:

  • chrony on the user in computer
  • go over the set up of the shutdown_host.yaml process to synchronize the soft shutdown

I did several tests and this worked well until at some point the boot was hanging at a specific step on the built-in computer:

  • the front bumper led lights remain solid white
  • the rear bumper led lights remain solid red (small light intensity variation maybe)
  • the built in computer can still be access via ssh and is responsive, however I suspected something was hanging in the docker containers start sequence.
  • the light sequence was reflective of the ESTOP activation, but I tried to deactivate after de latching the physical Estop on the robot via the remote control and could not. The remote control seems fine (battery level, flash responsive). The Estop LED animation was also different on the original version delivered on the robot (flashing red lights)

I tried two different method to restart the container while not doing a power cycle:

  • docker compose up -d --force-recreate on the built-in computer as the one mentioned in step 15 of setting up the soft shutdown of any user computer. This showed the same behavior (hanging and solid white at the front and red at the rear on the bumper lights).
  • docker compose down, docker compose up which shows the progress of the container process and indicates where the process is held by something as shown below:
    • [panther.lights_manager]: Waiting for required system message to arrive.
    • [panther.battery_driver]: Initialized battery driver using ADC data.
  • on some instances, I have the additional message error message a bit before the previous message but not always
    • [ERROR][timestamp][panther.lights_manager]: SetErrorAnimation: Service with name 'lights/set_animation' is not reachable.
    • when waiting a long time, I have sometime [battery_driver_node-7][WARN] [timestamp][panther.battery_driver]: An exception occured while reading battery data: Failed to read from file: /sys/bus/iio/devices/iio:devices1/in_voltage_raw
    • When this happens I have the same with in_voltage0_raw

Again with either methods the container start up seems held (hanging and solid white at the front and red at the rear on the bumper lights) with no info displayed in webgui.

The webgui on 10.15.20.2/8080 shows the background but not data at all (no status of battery, no estop status,…) while this works on the few sequences I did after the first_boot installation shell.

Could you confirm the following points:

  • possible nature of the issue
  • Is the robot just in Estop and not responsive to remote control, how could I release the estop without the remote control (ros2 command)?
  • latest image that should be used for the Panther install (I used the husarion-ugv-rpi-ubuntu-24.04-v2.2.1b.img as mentioned in the documentation, but I have also the husarion-ugv-rpi-ubuntu-24.04-v2.2.0.img and husarion-ugv-rpi-ubuntu-24.04-v2.2.1.img)

Robot version 1.23, ROS driver version 2.2.0, OS 24.04.2 LTS

Here are some screenshots:




I noted one difference between the compose.yaml file for starting the container and what is written on the documentation:

The command for husarion_ugv_ros is:

ros2 launch husarion_ugv_bringup bringup.launch.py
  common_dir_path:=/config

The command as shown on the documentation is:

ros2 launch husarion_ugv_bringup bringup.launch.py    namespace:=panther       shutdown_hosts_config_path:=/shutdown_hosts.yaml #if putting the soft shutdown synchronisation with the user computer

There is a namespace in the documentation and not in the image, not sure if this can impact the container proper start up sequence to get it to work as well.

Thank you for your help,

Nicolas

Hello @Nico2025,

1. Do I understand correctly that everything worked until the User-Computer was configured?
As for the animations, I will discuss them later.

Both commands will have the same effect of starting docker. Adding the -d flag will start it in the background so the logs are not visible then.
The logs you sent are from system initialization and there is nothing to worry about, some nodes use other nodes, and during initialization all nodes start up at the same time, it may then turn out that node A is waiting for node B, which is just starting up. In the mentioned error, another attempt to establish a connection will occur.
As for the warning related to battery_driver, we are aware of its occurrence, fixing it is not trivial, and the consequence is that when this problem occurs it means that the reading of battery parameters has failed. Since the reading is performed at a frequency of 100 Hz, the lack of one piece of information even every few seconds is not a major problem.

Since the release (2.2.x) the animations have changed. Before the release they looked like this old animations. They currently look like this new animations. So I wonder if the hang you mentioned is not just a change in animations between these versions.

As for webui, here it is clear that something is wrong. Do I understand correctly that webui worked right after reinstalling the system, and the problem started after integrating the user computer?

Regarding the aforementioned questions:

  • I am not yet able to understand the nature of the errors, something that may not be intuitive for users is checking if there is no device with the ROS 2 Jazzy distribution in the LAN. The panthera driver, despite the natively installed Ubuntu 24.04 on the Built-In computer is Humble. For the analysis of the error, please describe the issue:
    2. Is the robot working: Topics are visible, and you can control the robot using the teleop/gamepad?
    3. Provide all log of panther driver (docker compose up)
    4. Provide all log of webui (sudo logs husarion-webui -n 100)
  • Regarding Estop, there are two estops, one hardware (mushroom on the robot) and one software. You must first unlock the hardware e-stop, and then unlock the software e-stop. For this purpose you can use:
    a) Terminal: ros2 service call ~/hardware/e_stop_reset std_srvs/srv/Trigger
    b) Logitech gamepad: LT+A (X mode)
    c) webui (how it will work)

Currently this namespace argument is not required, since since release we offer a common husarion_ugv_ros package for both our Lynx and Panther bots setting based on an environment variable.

Hi @RafalGorecki, thank you for your response.

I did the update of the built in computer to have the feature of the webui and latest version. Our Panther was from the lasts units without the webui being implemented natively.

1 Everything worked with the delivered version, but I was advised to install the new version if we wanted access to the webui feature, hence the placement of a new SD card with new image (hofi was not working on our side, so I had to open the rear service compartment). Everything with the new image (2.2.1b) worked fine as I managed to follow the update process steps.

The only thing is I am not able to get the webui or get out of estop with the remote (the Estop button being released unlatched). I do have a new animation and may have interpret it as a hang while it was just stuck in Estop, but I had no way to get it out of estop (the gamepad estop release not working). I will retry again with the terminal and gamepad and webui once accessible and report. The hardware estop was released and not activated while testing.

The webui was working fine initially after the the first_boot.sh, then it did not show any info at all after a few restart. Those restart were to test the user computer programming to shutdown upon the built-in computer request. I can still access the address, but no data is visible as shown in the screenshot provided, hence I though that something was not sharing any data and the container may had not started fully as expected. Do I need to have the same ROS2 version on the machine using the navigator to reach 10.15.20.2:8080 than the one on the built in computer (I would have though it would not have an impact while using foxglove, and you could preview the data without having ROS being installed).

On my image, obtained from 2.2.1b that it is now installed on the built in computer, I believe the native ROS2 installed is Jazzy on 24.02 (if you check ls /opt/ros/ it shows jazzy or doing echo $ROS_DISTRO shows jazzy). The docker containers are using ROS2 humble. I cannot control the robot as gamepad is no responsive nor via teleop. The docker compose up was showing you the screenshots and associated logs.

I will recheck the estop, still unsure why the webui is not showing up.

Thank you for your help,

Nicolas

Hello @Nico2025,

Everything indicates that the robot driver is not working or is not running. By default, the driver should start as a docker container. I’m sending some general information and recommendations.

ROS Communication
To communicate with other devices via ROS, it is necessary for the computer on the other side to have:

  • ROS humble installed (version 2.3.0 with support for jazz will appear soon, but version 2.2.1b requires humble)
  • time synchronization (example here)
  • connection in the same LAN
  • installation and setting to RMW_IMPLEMENTATION=rmw_cyclonedds_cpp

When all these factors are met, then communication will take place correctly.

Humble - Jazzy conflict
Here I would like to point out that there should be no communication between ROS humble and ROS jazzy in the LAN. So please temporarily disable all devices with ROS jazzy. The built-in computer has ros jazzy installed, but the daemon should be disabled by default.

WebUI
WebUI if there is a problem with the driver, webui will not work either. The same applies to the gamepad. So you should first focus on why the driver stopped working.

Suggestion
If nothing helps, I suggest reinstalling the system. I know you just did it, but it will be easier to start with a clean system. If the problem returns, I would ask for instructions on what steps to take to reproduce this error. I will also add that we have updated HOFI, so this method should work now as well.

Hi @RafalGorecki ,
I went through the entire re-installation, putting 2.21b on built in computer.
User computer has the same ROS version installed (humble), is on Ubuntu 22, and the time sync is done as chronyc tracking is showing the builtin computer IP. It is on the same LAN and has
installation and setting to RMW_IMPLEMENTATION=rmw_cyclonedds_cpp

Now the Panther can be released out of estop from the gamepad and it can be moved via the gamepad.

However I cannot release the software estop via the terminal command indicated:

  • ros2 service call ~/hardware/e_stop_reset std_srvs/srv/Trigger
    • where exactly should this be run? on the builtin container husarion_ugv_ros container? It keeps indicating waiting for service to become available… This is the case if I run the command in the ugv container or in the native builtin.
  • ros2 service call /panther/hardware/e_stop_trigger std_srvs/srv/Trigger from the user computer as shown on User Computer Setup Guide | Husarion

The actual estop button is released for both cases.

Likewise, the teleop from the user computer is not responsive, I can see all the topics including /panther/cmd_vel, I can run the ros2 launch teleop_twist_keyboard teleop_twist_keyboard and get the TUI visible, but it has no effect on Panther (Estop is released as well).

For the WebGUI, I can connect to the 10.15.20.2:8080 URL (builtin computer), which then opens the foxglove page but remains without any messages. What would be the specific parameter to set up in the foxglove connection to expect the GUI to work: ws://localhost:9090/? In other words, is the built-in computer supposed to have a websocket or ros bridge activated by default to get those data visible and if so, what are the parameters to access it when setting up the foxglove connection.

I did a few more tests and activated a ros2 rosbridge web server web as shown in Running Foxglove on ROSbot XL | Husarion. I installed the ros bridge on the user computer and set up the foxglove connection to the ws://10.15.20.3:9090 (default port 9090) from the URL 10.15.20.2:8080 (so foxglove URL pointing to the built in computer, to look on topics from the user computer websocket). I can start seeing the web gui items being semi live, but no information are visible directly. I have to restart a second time the rosbridge on the user computer having foxglove open to get a more reliable publishing. Then only I can see the IMU graphs, the battery voltage, messages but the estop release (GO button is not effective) and teleop interface is not working as well when the hardware estop is released.

I was also able to receive the feed from the Ouster LIDAR and visualize it as well in foxglove (huge latency but visible).

So in short I manage to get information from the built-in computer to the user computer but I do not manage to send commands (estop release or teleop) in the opposite direction. I wonder if it is a domain or space name issue. I control and do all testing so far on the user computer via terminal (ssh from an external laptop).

Thank you again for your help,

Nicolas

PS: you mention that the version 2.3.0 of the built in computer will support jazzy, is that far from now?

Hi,

I got now the webui from the built in computer to work:
http://10.15.20.2:8080/ui and foxglove is to be configured with ws://10.15.20.2:8080/ws foxglove_bridge. The /ui was missing in the URL and this is the part which redirects to the correct set up of foxglove. Apologies for that.

Still got no success on the teleop keyboard and the estop software reset.

Thank you,

Nicolas

Hello @Nico2025,
Firstly, I would like to apologize for the long waiting time.
In relation to your query about the publication, the correct service name is of course /panther/hardware/e_stop_trigger. ~ means use your robot namespace (default namespace is panther).

Regarding communication problems between User Computer and Build in computer, please let me know if:

  1. Topics from the built-in are visible using ros2 topic echo ... from the User Computer
  2. Are topics sent from the User Computer visible using ros2 topic echo ... on the Built-In Computer
  3. Does the information being sent have the namespace. The panther/cmd_vel topic should be publish.
  4. Is the time of the message from teleop synchronized with the built-in computer. A difference in date (or a delay in receiving the message) of more than 0.2 seconds will result in ignoring the received message.

I inform you that the image for jazzy on the built-in computer is already available. And within a week or two, a universal image for user computers with ubuntu should appear. Operating System Reinstallation | Husarion

Hi @RafalGorecki,
thanks for your reply.

Just focusing on the topics:

  1. Topics from the built in are not visible from user computer.
  2. Topics from the user computer are not visible from the built-in computer.

Just a small summary to make sure I understand well.

Built-in computer:

  • the built-in computer is running ROS2 humble (image version 2.2.1b) from a container husarion/husarion-ugv:humble-2.2.1-20250331, name husarion_ugv_ros. There is a ROS2 jazzy version installed natively on this image which is on Ubuntu 24.04. When doing ros2 topic list from the native terminal (so via the native ros2 jazzy) I obtain:
    • a first section showing errors (expected when having different ros2 version communicating) of the type [WARN] [time] [rmw_cyclonedds_cpp]: Failed to parse type hash for topic ‘rt/tf’ with type ‘tf2_msgs::msg::dds_::TFMessage_’ from USER_DATA ‘(null)’ If I repeat the command, there will no longer be these warnings.
    • then it lists all the topics from the user computer (all including /panther/…) as published by the container containing ros2 humble.
  • the built-in computer container with ros2 humble is showing ROS_DOMAIN_ID=0 and ROS_LOCALHOST_ONLY=0 , but those variables are at default value and I would expect them to match any other ROS2 humble version default variables (not specified) on the same network.

User computer:

  • in our case, the user computer is the Zed Box Orin NX 16GB (jetson Orin NX 16GB). I have installed ROS2 humble onto it. When doing ros2 topic list, I cannot see the built-in computer topics when the RMW is set up as RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
  • in the past, the RMW_IMPLEMENTATION=rmw_cyclonedds_cpp was missing on the user computer. It was by default using rmw_fastrtps_cpp. I was able to see the built-in computer topics from the user computer in that case and vice versa. I actually reverted to rmw_fastrtps_cpp on the user computer and I now can see the built-in computer topics from the user computer and the built-in computer can see the user computer topics. SO this was the RMW_IMPLEMENTATION change to rmw_cyclonedds_cpp which seems to be creating the issue.

Programming computer:

  • with a third computer used for programming, using Ubuntu 24 and with a ros2 jazzy install, I am able to see the ros2 topic as published by the builtin computer when connected to the Panther wifi (on the 10.15.20.X). However when starting a ros2 node on the user computer (on humble) such as the one used for the ZED x camera, I do not see those topics on the builtin or on the programming laptop if I am on RMW_IMPLEMENTATION=rmw_cyclonedds_cpp on the user computer.

All ROS installation are set up with the RMW_IMPLEMENTATION=rmw_cyclonedds_cpp appart on the user computer where I modified to test troubleshooting:

  • if RMW_IMPLEMENTATION=rmw_cyclonedds_cpp , I cannot see the topics between the user computer and the built in or programming computer
  • if RMW_IMPLEMENTATION is not specified, reverting to default, I can see the topics between the user computer and the builtin or programming computer.

So not sure if this is the expected behavior as I was under the impression of having RMW to the same value (here rmw_cyclonedds_cpp) for all ros would make ros topics available to all nodes connected.

When doing: ps -ax | grep ros-domain, all install are using --ros-domain-id 0 on their running ROS daemons.

Thank you,

Nicolas

Hi again,

another remark, when the topics are visible,

  • ros2 run teleop_twist_keyboard teleop_twist_keyboard --ros-args --remap cmd_vel:=/panther/cmd_vel works to control the Panther from the user computer.
  • ros2 service call /panther/hardware/e_stop_trigger std_srvs/srv/Trigger allows to trigger the software estop
  • ros2 service call /panther/hardware/e_stop_reset std_srvs/srv/Trigger allows to reset the software estop

Hello @Nico2025,

Strange, but the good news it’s most likely a matter of DDS configuration, but more on that in a moment.

  1. Please check if env is set to RMW_IMPLEMENTATION=rmw_cyclonedds_cpp on built-in computer in compose.yaml
  2. I do not recommend checking topics natively on ros2 computer because it is a different distro. It is better to log in to docker with the command docker exec -it husarion_ugv_ros bash.
  3. Also check if after changing RMW_IMPLEMENTATION restart ros2 daemon
  4. DDS configuration
    From what I remember Jetson has a different Ethernet port name than the standard eth*.
    From what I also remember Cyclone listens by default only from one port and I suppose it is not the right one. So, it is necessary to manually set this port to the correct one or set the IP address.
    It will probably look something like this:
<?xml version="1.0" encoding="utf-8"?>
 <CycloneDDS
   xmlns="https://cdds.io/config"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd"
 >
   <Domain Id="any">
     <General>
       <Interfaces>
        <NetworkInterface address='10.15.20.2' />
      </Interfaces>
      <AllowMulticast>default</AllowMulticast>
      <MaxMessageSize>65500B</MaxMessageSize>
    </General>
    <Tracing>
      <Verbosity>config</Verbosity>
      <OutputFile>
        ${HOME}/dds/log/cdds.log.${CYCLONEDDS_PID}
      </OutputFile>
    </Tracing>
  </Domain>
</CycloneDDS>

Then export both:

  • export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp and
  • export CYCLONEDDS_URI="file://$HOME/CycloneDDS/my-config.xml"

I haven’t tested this configuration, but I hope it will work. For more info check:
https://cyclonedds.io/docs/cyclonedds/latest/config/network_interfaces.html#
Instead of address field you can try <NetworkInterface name="change_to_jetson_eth_port_name"/>

Hi @RafalGorecki ,

  1. so in the compose.yaml, it points towards an ENV_FILE in the common section which define the RMW_IMPLEMENTATION=rmw_cyclonedds_cpp so this is all good on that front.
  2. When checking inside the running container, docker exec -it husarion_ugv_ros /bin/bash, you can check which version is running by doing ps -ax | grep rmw, in our Panther case it does show rmw_cyclonedds_cpp.
  3. Any time I do a RMW change, I do ros2 daemon stop and ros2 daemon start
  4. This is where it gets interesting, when using the default cyclonedds file such as shown on https://www.stereolabs.com/docs/ros2/dds_and_network_tuning#tuning-for-large-messages the following default configuration was creating the issue:
<NetworkInterface autodetermine="true" priority="default" multicast="default" />

It seems that cyclone dds picks up the best network interface which in the case of the ZED box is not the ethernet one plugged to the built in computer. So this line option needs to be modified to:

<NetworkInterface name="eth0"/>

The direct IP address did not work on my case. You can identify the list of available network interface by doing using the ip link command.

So cyclone dds on ZED box needs configuration adjustment as shown above to avoid the auto selection of the network interface which here was not be the Ethernet port communicating with the built-in computer. When adjusting with the relevant interface name (here eth0), I can now see the panther built in topics from the user computer and I can see the user computer topics from the ros2 container on the built-in computer.

Thank you so much for your help. I will be running a few more tests and report but it looks like it solved the issue.

Nicolas

1 Like

Thank you for the information and I’m glad you managed to find the source of the problem. :hugs: