Bullseye update

I was previously planning on working on 64-bit support before trying to upgrade to bullseye, but I decided to switch that around and try to get us onto the latest version of Debian (11, aka bullseye) first.

The punchline is that I believe I am close to getting it working. I’m not sure when I’ll have an image ready for testing, but when I do, I’ll post back here.

Major obstacle overcome

The big issue with bullseye is that they’ve removed the libraries needed for the web browser (kweb), which is what displays the UI on the LCD screen. I’ve solved that by switching to the Midori browser and writing a little service to make sure the URL bar is hidden.

Minor obstacle overcome

The next issue I came across is that the rule for updating the sensor values called .floatValue() on an object that may not have a .floatValue method. I have that one solved as well and have submitting the patch to @HestiaPi.

MQTT change

MQTT appears to now require authentication by default. I enabled anonymous access in the mosquitto config and after restarting, OpenHAB2 was able to connect without any problem. This caused the UI on the LCD to show the temperature and allow me to access the heating/cooling/fan menus.

Remaining work

I just got the MQTT issue fixed and haven’t looked into the remaining problems yet. They might be easy to solve, or they might be a total slog, time will tell.

  • The heating and cooling don’t seem to be turning on when they should
  • The LCD seems to be slower to respond, noticeably so

Update: I have a functional* bullseye image working*. OpenHAB2 functions as expected, the web interfaces work, the LCD works*, there’s no browser bar on the LCD UI (after the page loads).

* It does turn the HVAC on and off, but the LCD responsiveness is completely unacceptable.

To be more specific, the LCD responds to the first touch after about 20-30 seconds. However, after it finally starts paying attention to the user who is tapping the screen, it becomes acceptably responsive with only about second of lag. If you leave it set for a couple hours and come back, it is laggy for the first touch again, but then works okay. So this is not just a problem when it first boots.

Why so slow?

My best guess at why the screen is so slow is due the LCD UI being so CPU heavy. I have a Raspberry Pi Zero 2 W soldered into a HestiaPi board here that I’ve been tinkering with and it boots much quicker but it has the same issue with lag, albeit to a lesser degree (15-20 seconds for the first tap).

The load averages are also high, but that was also true on the Buster image and the LCD seemed to work acceptably well there. The memory usage is high, but when I added additional swap space it wasn’t really used.

As mentioned above bullseye is using Midori instead of kweb, so this, in combination with the LCD UI being a bunch of javascript, seems to be the reason for this new LCD issue.

Performance info about the Zero and Zero 2
pi@raspberrypi:~ $ free -h ; uptime # raspberry pi zero W
               total        used        free      shared  buff/cache   available
Mem:           429Mi       239Mi        76Mi       0.0Ki       114Mi       139Mi
Swap:           99Mi        85Mi        14Mi
 04:27:28 up  6:12,  1 user,  load average: 1.99, 1.92, 1.92
pi@raspberrypi:~ $ # top reports 36% of memory is OpenHAB2, 32% by midori
pi@raspberrypi:~ $ # majority of CPU time is used by midori
pi@raspberrypi:~ $ # midori 29 minutes of CPU time, OpenHAB2 = 8 min
pi@raspberrypi:~ $ free -h ; uptime # raspberry pi zero 2 W
               total        used        free      shared  buff/cache   available
Mem:           427Mi       273Mi        51Mi       0.0Ki       101Mi       100Mi
Swap:           99Mi        95Mi       4.0Mi
 04:27:27 up  9:49,  1 user,  load average: 2.27, 2.01, 1.99
pi@raspberrypi:~ $ # top reports 45% of memory is OpenHAB2, 24% by midori
pi@raspberrypi:~ $ # majority of CPU time is used by Midori
pi@raspberrypi:~ $ # midori = 3 min of CPU time, OpenHAB 2.75 minutes

The HTML for the local UI (scripts/oneui/index.html) uses a ton of JS code and looks like it was auto-generated from something. If we can make the code as simple as the UI, we should be able to reduce the load enough for the LCD UI to be responsive again.

If anyone knows where this file came from, I can start taking a look at that. If not, I can start re-writing a custom UI from scratch which is simpler and more performant.

1 Like

@hestia_hacker
Is there a link to the image to help with testing, or are you still working on it?

I’m going to do some proper tests this weekend, but anicdotally, it feels like it responds just fine when it has had the HVAC shstems off. I think the load might be specifically from the animation for when the cooling & fan are on. If that is being done in JS, I could imagine that potentially being taxing for a little pi zero.

After 22 hours of uptime, this image seems to work just fine if the heating/cooling/fan are all off. Every time I’ve walked by the thermostats on the electronics bench, I’ve tried to use the UI and it responded at least as fast as the Buster image (1.3-dev).

So this is some evidence that it could be the animation that’s causing the load. My next test will be to turn on the cooling & fan to see if I can get the lag issue to come back.

I’m also collecting more objective measurements such as the amount of CPU time used by the most CPU heavy processes so it’s easier to quantify the load of the current UI.

I’ll be interested in hearing if @ed1023 is able to reproduce my test results.

@hestia_hacker Question should I be testing on the zero w or zero 2 w?

Either would be okay, but I’d prefer testing with the original Pi Zero for a few reasons:

  1. I want the bullseye build to support all the existing HestiaPi thermostats out there
  2. The Pi Zero is available, the Zero 2 really isn’t
  3. The reset pin changed on the Pi Zero 2, so it’s less of a drop in replacement than I had expected (and for units powered by the HVAC system, that reset button can be kinda important)

Having said that, if you have both and are willing to test on both, that’d be great too. If the zero 2 becomes available again, I’d like to do some more experiments with it.

Ok so I have both set up with real hardware for development and testing.

Initial findings:
Note: any time I talk about memory or CPU etc I am just observing from htop if you want something more accurate, I got a bash script to gather stats I could adapt.

  • After about 30 min things settle down and touch is relatively responsive.
  • My systems both switch back and forth between the info screen and main hvac screen without me touching anything
  • During the first setup, I was seeing a “Failed to start hide the URL bar” (not sure if this was causing things to stall on setup.
  • If you press too long on the touch screen, you get a context menu for the web-browser
  • on the zero w, the CPU was pegged out for a good 10 min and I would see the web-browser with URL field for about 1 min on startup

Side note: I am sure you know this for the zero 2 w the reset did not move but changed from a through hole to a pad. I solved this by putting a blob of solder on the pad and then some flux paste on it. Then soldered the pi to the Hestiapi PCB then I shove a piece of copper wire through and solder it in and the reset button works. Note: I did my pi attachment a little differently than hestiapi tutorial vid. I used a set of header pins so my pi is spaced off the board a little.

Some updated observations: 12 hours in

  • First note we need to use a high-quality SD card. I had an old SD card with no markings on it other than brand and size (scandisk 16gb), and the system response was slow when I switched to a newer one I got better response times. My thinking is part of the problem is storage IO.
  • Another thing I saw on both zeros is the swap space is almost always full. I wonder if this is causing storage IO problems. I was going to try to remove it and see how things work.
  • zero w touch screen response is about 1 sec
  • zero 2 w touch screen response is unreliable times very fast, and sometimes it will take many sec to respond (both are using the same pi touch screen)

TL;DR @ed1023 and I seem to have identified the biggest issues with the current bullseye image and it’s going to take some significant re-work to fix, no matter what approach we take. It all comes back yo kweb not being maintained anymore.

I’ve now done my second test of running the cooling & fan constantly for a long time and the pi zero takes a long time to respond. It’s also clear it is struggling to do the animation; the animation is not nice and smooth.

In this same test the zero 2 responded in less than a second, but not the nearly instant response from the previous test (where there were no animations).

The load averages on both models doubled!

I also am getting what appear to be phantom button presses when it is under heavy load. As best I can tell, it’s processing the same event multiple times. For @ed1023 that manifested as opening and closing the info pane. For me, it was pressing the “lower temperature setpoint” button. This got me to a very cold setpoint before I turned off a few hours later (see pic below).

The speed of the SD card is an interesting point.

For the “Failed to start hide the URL bar”, where did you see that? Was it in the UI somewhere, or was that buried in a log message somewhere?

The issues that I am tracking:

  1. Huge lag on LCD button presses
  2. Phantom LCD button presses
  3. Long click = right click (find a way to disable?)

I’m hiding the menu bar as quickly as possible. Unfortunately it requires a click… oh my goodness, that’s probably the source of the phantom button presses! Crap.

I’ll have to think about that one. I think I spent about 6 hours trying to get the URL bar to be hidden by default and found that until the user clicks on the page, it just insists on showing that URL bar. So I modified another person’s script that automated this click. I don’t have any way to know if the click event actually registered and caused the URL bar to go away, so I was just running it repeatedly.

I’d be glad to abandon this hacky workaround, but I don’t want the URL bar showing either. I am almost certain Firefox and Chromium will require more memory than that little pi can spare, so I guess this means investigating Vivaldi, Luakit, and Epiphany.

If we can get a browser that will auto-hide the browser UI to and just show the webpage by default, we can do away with this hacky script and maybe slim down the HTML in the UI to be usable.

As for bringing back kweb, the closest thing I found to confirming “that ain’t gonna happen” is this: Kweb Suite (Minimal Kiosk Browser, omxplayerGUI) - Page 64 - Raspberry Pi Forums

Playing devil’s advocate here but do we need Openhab it seems a little resource intensive when viewed from pi zero. Should we look into something like updating and bringing in Jan Bonne’s IOThermostat work

I expect that eventually, we’ll have to switch away from OpenHAB because they (and all implementations of the Java Virtual Machine) are abandoning support for the Pi Zero (and other low end hardware). The only way we can upgrade to OpenHAB3 is to ditch the Pi Zero and go to the Zero 2 (or something else), thus either doubling the maintenance or abandoning all the existing HestiaPi boxes out there.

Having said that, OpenHAB2 is still supported for now, so I’d like to postpone switching away from it as long as possible.

In the department of good news, I managed to get Midori working (not showing the URL bar) without my janky little clicker service. The issue was that Midori v7.0 has a bug which was fixed at some point at or before v9.0. I managed to download the source and compile it and that solved that.

I was not able to reproduce the long clicking on the touch screen bringing up a context menu (may have been fixed by updating to Midori v9.0, I just tested on the hardware & images that I have).

This means that the only remaining issue is to bring down the CPU load from the LCD webpage. I think managed to find the source code that generated the LCD UI, but it’s not very helpful. I expected to see some HTML in there somewhere that I could use as a starting point. I should be able to come back to the HTML-base UI in within the next few days.

I’ve been making the changes on my live system, so I don’t have a new image to share with my progress. It’d probably be easiest to just wait for the build but if anyone is raring to go, the midori source is at GitHub - midori-browser/core: Midori Web Browser - a lightweight, fast and free web browser using WebKit and GTK+ and do a git checkout v9.0 to get the 9.0 release. Then just follow the instructions in the repo’s README to build & install it. You can check the version of midori you have with midori -V. It should show 7.0 before you do the build/install and 9.0 afterwards.

Just a quick update, and all positive news!

I’ve managed to update the existing UI to eliminate the animated and now I have a bar above functionality that is active. It isn’t as polished of a look as the animations, but it is more functional.

As mentioned before, I have Midori 9.0 running without any hacky workarounds.

If testing goes well, I will submit the UI changes upstream to @HestiaPi and update the build pipeline with the other changes.

Happy Friday!

I was still seeing the screen be unresponsive sometimes. I can’t tell if I’m not tapping properly and the clicks aren’t registering or if it’s a performance issue because of switching to Midori (or from switching to Bullseye).

So I ported kweb to use webkit2 (instead of the original webkit) and got it running on Bullseye.

So, yeah, after all that work experimenting with Midori, it’s likely that I’m going to abandon it and switch back to kweb (albeit a significantly different version). I’m testing it now, with the non-animating UI to give it the best chance at being acceptably performant.

Some clicks don’t seem to register at all and other clicks work just fine, but I can’t tell if it’s a hardware issue, a software problem, or user error. The CPU load, RAM & swap usage are all still higher than I’d like, but so far if the click registers, the UI seems to respond in a reasonable amount of time.

I’m working on building a proper image using the CI system (with the original, animated UI) but my first attempt forgot to modify kiosk-xinit.sh to include the protocol handler for kweb (it has to be file:///home/pi/scripts/kiosk-xinit.sh instead of just /home/pi/scripts/kiosk-xinit.sh). I’ll have a new build in a day or two, but if you want to manually make that change you can get the broken image here Artifacts · bullseye_hestiapi_ansible (#5742) · Jobs · hax0rbana_public / raspberrypi-automation · GitLab I’ve been putting the "file:///home/pi/scripts/kiosk-xinit.sh" in quotes like that, but I think the quotes are optional.

@hestia_hacker if you have an image I can test I several displays to test with and i have some new pi zero w’s. I also have some of the new clicky boards.

@jrtaylor71, what are the new clicky boards? If applicable, please share link.

@hestia_hacker I will also test when the image is available

There’s a thread about the clicky board and it has a link to the page with the PCB files and BOM.

If you’re willing to ignore the fact that the wifi setup screen is a huge error instead of a nice UI, you can test with this image. https://gitlab.hax0rbana.org/hax0rbana_public/raspberrypi-automation/-/jobs/5754/artifacts/raw/hestia-pi-ONE-v1.3-dev-bullseye-5754.img.xz As long as you know to connect to the HESTIAPI access point and enter your wifi creds, you can get it to go to the main UI; it’s just the setup screen that is broken.

I messed up the quoting the last time around. My code should be fixed now, but it takes at least 6 hours to build a new image from scratch.

Also, I went back to the 1.3-dev image real quick and the touchscreen registers clicks just fine, so it’s not a hardware issue nor user error. It’s something in the updated software. After I going to get some benchmarks on the 1.3-dev (buster) image to establish a baseline, I’m going to try putting the version of kweb that uses webkit2 onto the buster image and see if it performs like the new bullseye image, or if it goes quickly. That should tell us if the performance issue is in webkit2 or something in bullseye.

@ed1023 PCBA Info – HestiaPi

These also fit in the 3d printable case.

So testing hestia-pi-ONE-v1.3-dev-bullseye-5754.img.xz I could connect to wifi and set SSID and password, it then booted up and displayed this error.

Error opening file /home/pi/scripts/openhabloader.html &: No such file or directory

Not sure if this is the error @hestia_hacker was talking about.

Hey @ed1023, sorry about the error in that last image. Apparently I also SSHed in and fixed the quoting issue after getting the wifi set up and I forgot that I did that step. But now I have a proper image for testing, expected test results, and I think I’ve finally identified the cause of the performance issue.

The image for testing: Artifacts · bullseye_hestiapi_ansible (#5815) · Jobs · hax0rbana_public / raspberrypi-automation · GitLab

The expected result is that it will generally be horribly slow sometimes and reasonably responsive other times. While it’s possible to use it in this state, I find it frustrating and consider it wildly unacceptable.

The problematic software seems to be webkit2, which is most unfortunate. The reason I suspect webkit2 is because I set up kweb with webkit2 on the 1.3-dev image. It worked fine out of the box but after I switched to webkit2, it exhibited the high load averages, large memory usage, and most importantly: the sometimes unresponsive UI.

I’m now switching my test unit back to use kweb just so I can quantify the performance differences. I’ll post the metrics after I can provide a nice side-by-side comparison. If you want to collect the exact same things I’m collecting:

  • uname -a
  • free -h; uptime; date
  • top -o +%MEM | head -n 12 | tail -n 6

I believe the root cause is memory pressure, but I need to collect more information to be sure. The CPU load is also higher, especially with the flashing icons, so I’m not completely confident that it’s purely memory pressure. It seems like there may be multiple things going on here that need to be teased apart.

The path forward

If I am correct that the issue is webkit2 taking up more resources than the Pi Zero has, we’re in a tough spot. Switching to the Pi Zero 2 would leave all the existing units stuck at Buster forever. Plus my testing testing thus far indicates that the Pi Zero 2 has the same problem, although perhaps less frequently. Furthermore, the Zero 2 is literally unavailable, whereas the Zero is currently just limited in availability.

Here are my thoughts on things that might help:

  • Add swap space and require everyone to get lightning fast SD cards
    • May still not be sufficient
  • Find some way to get kweb (w/ webkit1) running on Bullseye and later
    • I don’t thing webkit1 is still being maintained
  • Find another (actively maintained) web rendering engine and browser
    • I looked and couldn’t find anything that is even as good as kweb w/ webkit2, let alone better
  • Find a way to keep the browser in memory and getting priority
    • Maybe nice would help here? Not sure that prevents it from being swapped out though
  • Modify the local UI to somehow require fewer resources
    • Change the web UI to not have flashing icons
      • Code changes for this are done, not sure if it will be enough impact to solve the problem
    • Reduce JS usage?
  • Rewrite the local UI completely and don’t make it web based
    • Lots of work, but this should be a viable solution otherwise
  • Ditch the local UI completely (no screen)
    • Would basically be a different model HestiaPi at that point
    • Requires changing the case design
  • Make the local UI be display only
    • Again, this is changing the functionality of the device
  • Add a second pi (one to run OpenHAB, the other to run the LCD UI)
    • This is getting a bit ridiculous, but I’m running out of ideas…
    • This would give us memory for the UI because it’s not competing with OpenHAB
    • The OpenHAB pi could be hardware that can run newer versions of Java and thus OpenHAB3
    • The OpenHAB pi could be physically located somewhere else (though it does need to connect to the physical HVAC wires)
    • The UI pi could be located anywhere and only only need 5V DC
    • Existing hardware could function as either the OpenHAB pi or the LCD pi
    • The drawbacks are: added complexity, not a single standalone unit, requires wifi for any functionality, more expensive, lots of development and testing would be required to actually accomplish this

In all likelihood, it’ll require some combination of the above. The option I like best so far is re-writing the UI, which is unfortunately a lot of work and testing. It is likely to be a viable solution in spite of our hardware limitations, remains a single standalone unit, maintains compatibility with existing hardware and it doesn’t sacrifice functionality (still works when the wifi goes down, usable by people in your house but not on your wifi network (e.g. guests)).