I believe the rules file was kept empty till openHAB was fully loaded and then the file was copied. Keep in mind that there are 2 places where rules are stored. First is the DB that you can access from PaperUI and second is the file you mention and is empty (moved/hidden) at the beginning till the Initialization rule prepares OH. Google Drive, although I’m not a big fan, allows such files.
If I recall correctly, and I’ve been steeped in OH 3.x for over a year now, that error is not unusual. Every time a file is changed there are two events, one with the file starts to change and one when it completes. Unfortunately both events trigger OH to try to read the file and the file is basically empty on the first event. So you’d see this error and then it’d be followed by a successful load of the file.
If you see "Loading model ‘default.rules’ instead of another error, you know that it loaded correctly.
However, the “Ignoring file ‘defaul.rules’ as we do not have a parser for it” message is a new one for me. That implies that there is something wrong with the rules engine, but that’s just there by default so I’m not sure what would cause that error. Did you kill OH immediately after seeing that other error? That could account for it. The rule engine was killed and OH started closing down before the rules file was parsed.
Remove the .rules file from OH’s folder and restart OH. This time give it a chance to fully come up before copying the file over. There might be stuff getting in the way.
It’s not part of the install process. It’s part of the startup process in kiosk-xinit.sh if I remember correctly.
Because the RPi 0 is really under powered to run OH, and parsing a .rules file is really heavy on the
CPU, delaying that step until everything else starts up actually ended up saving a significant amount of time on the boot process which, before I started was well over five minutes.
All that this default.rules file does is set a flag (in an Item) to indicate that OH and all the rest of the HestiaPi system has finished coming online which will allow the rules that actually drive the thermostat.
The PaperUI rules (written in a dialect of ECMAScript 5.1 called Nashorn) can be found in /var/lib/openhab/jsondb. Moving most of the rules out of .rules files to this saved minutes on the boot time.
Winter is a bad time to be taking one’s thermostat offline, so progress continues to be slow, but I did some static analysis and found that on a working HestiaPi, the
hostapd systemd service is “active (exited)”. I wanted to understand why it exited cleanly on the v1.2 image but didn’t do so on my self-built image.
/etc/init.d/hostapd as the module it’s loading. I compared that file on each of the two image, and it’s identical, as is
/etc/defaults/hostapd. There’s one line that sticks out to me here, which is this:
[ -n "$DAEMON_CONF" ] || exit 0
-n check is “True if the length of string is nonzero.”.
DAEMON_CONF is set to an empty value on line 19, which explains why the production hestiapi is exiting cleanly. It does not explain why my build does not exit cleanly.
So I did what any good hacker would do: grabbed an extra hoodie and turned off the heat so I could test a new image on my open source thermostat.
Sure enough, the hostapd service on my image is NOT
/etc/init.d/hostapd, but rather
/lib/systemd/system/hostapd.service, which has no such checks for whether
DAEMON_CONF is empty or not (in fact,
DAEMON_CONF isn’t even a thing in my .service file). So that solves that mystery.
Masking the hostapd service, removing the hostapd.service file in
/lib/systemd/system and then unmasking it caused the correct service file to be picked up and starting the service acted just like the production hestiapi. That should resolve the hostapd issue.
Yeap… that’s the way
After fixing the hostapd service, openhab seems to inexplicably be in a better place. I now get the flame/snowflake/fan icons and the temperature set point is showing with the default value of 70F.
While better, it’s still not up to the bar of “functional” yet, as the temperature and humidity are both showing up as zeros. In the logs, there’s a clue…
2022-01-29 19:31:58.864 [WARN ] [.transport.mqtt.MqttBrokerConnection] - Failed subscribing to topic hestia/external/+/humidity java.util.concurrent.CompletionException: com.hivemq.client.mqtt.exceptions.MqttSessionExpiredException: Session expired as connection was closed. ... 2022-01-29 19:40:09.422 [INFO ] [marthome.model.script.initialization] - System is ready to operate, kicking off restored behaviors 2022-01-29 19:40:10.893 [ERROR] [e.automation.internal.RuleEngineImpl] - Failed to execute rule '32223121-5acf-423f-a9f5-1dffbe665927': Fail to execute action: 2
Of course, the
32223121-5acf-423f-a9f5-1dffbe665927 rule! Why didn’t I think of that? Thanks to @rlkoshak’s tips on where I can fidn the rules file, I was able to grep my way to finding the culprit. It’s in
/var/lib/openhab2/jsondb/automation_rules.json and the script is… large, but it’s the one labeled “Initialize all the setting Items” (in case that UUID was generated at runtime).
I found this rule in the PaperUI and ran it there so I could see what shows up in the logs. Here’s what I got:
2022-01-29 19:51:02.567 [INFO ] [marthome.model.script.initialization] - Comfort_Mode is COMFORT, Default value is 1 2022-01-29 19:51:02.881 [INFO ] [marthome.model.script.initialization] - Comfort_Value is 1, initializing to 1 2022-01-29 19:51:04.983 [INFO ] [marthome.model.script.initialization] - System is ready to operate, kicking off restored behaviors 2022-01-29 19:51:05.936 [WARN ] [e.automation.internal.RuleEngineImpl] - Fail to execute action: 2 java.lang.IllegalArgumentException: The argument 'command' must not be null. at org.eclipse.smarthome.core.events.AbstractEventFactory.checkNotNull(AbstractEventFactory.java:117) ~[?:?] at org.eclipse.smarthome.core.items.events.ItemEventFactory.assertValidArguments(ItemEventFactory.java:390) ~[?:?] at org.eclipse.smarthome.core.items.events.ItemEventFactory.createCommandEvent(ItemEventFactory.java:222) ~[?:?] at org.eclipse.smarthome.core.items.events.ItemEventFactory.createCommandEvent(ItemEventFactory.java:238) ~[?:?] at org.openhab.core.automation.module.script.internal.defaultscope.ScriptBusEvent.sendCommand(ScriptBusEvent.java:94) ~[?:?]
The code right after the Comfort_Mode and Comfort_Value gets printed to the logs is related to the humidity, which lines up with the error seen earlier.
def = DEFAULTS.get("Humi_DEF"); min = DEFAULTS.get("Humi_MIN"); max = DEFAULTS.get("Humi_MAX"); initSetpoint("HumiSetpoint", 50, 100, 0);
I’m not really sure where to go from here. I know that the hardware in this pi is working correctly, because everything works fine when I switch back to v1.2. Any ideas on what’s going on here or how I can debug further?
The temp/humi values are obtained by executing a script. For OH to be able to call a script the “Exec” binding had to be installed. Can you make sure this is installed and functioning ok?
I also remember there was an extra security need that each script had to be included in a whitelist but this should not be needed anymore (HestaPi stuck at 0°-screen (no. 15 of First boot ONE-tutorial) - #16 by rlkoshak).
Scripts to pull data were not run properly. TL;DR the
i2c_dev kernel module wasn’t loaded.
pi@raspberrypi:~/scripts $ ./getBMEtemp.sh Traceback (most recent call last): File "/home/pi/scripts/bme280.py", line 30, in <module> bus = smbus.SMBus(0) # Rev 2 Pi, Pi 2 & Pi 3 uses bus 1 IOError: [Errno 2] No such file or directory
It’s not an ImportError, but I checked to see if I had the library anyway. I do.
pi@raspberrypi:~/scripts $ apt list python*smbus* Listing... Done python-smbus/oldoldstable,now 3.1.2-3 armhf [installed] python3-smbus/oldoldstable 3.1.2-3 armhf
I took a look at the library and I see that it appears to be using FFI to call into native code (the shared object file).
pi@raspberrypi:~/scripts $ dpkg -L python-smbus /. /usr /usr/lib /usr/lib/python2.7 /usr/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages/smbus.arm-linux-gnueabihf.so /usr/share /usr/share/doc /usr/share/doc/python-smbus /usr/share/doc/python-smbus/changelog.Debian.gz /usr/share/doc/python-smbus/changelog.gz /usr/share/doc/python-smbus/copyright
I thought “but Hestia, how is this any different than the dev 1.2 image, which works just fine?” Great question, Hestia, glad you asked. That one is using an older version of the library, 3.1.1+svn-2 instead of 3.1.2-3.
Now, why would a newer version appear to be completely broken, I have no idea. Lets keep digging, shall we? I ran the python script under
strace -f and found that at some point it was trying to open
/dev/i2c-1 and failing. I looked in
/dev and found that there wasn’t any
i2c* devices in there. Comparing that to a pi running the 1.2-dev image and I could see that this device file should exist.
Awesome. Looking at the kernel modules reveals that I do have
i2c_bcm2835 on my image, but not
i2c_dev. Looking back at the manual installation instructions, I don’t see anything about making sure this kernel module is installed or loaded.
OK, fine. “Do we even have this kernel module?”, I wondered. I took a long shot and figured I’d just blindly try to load it without checking to see if it exists first. What could go wrong, right? I ran modprobe and… no output. Could it have been that that easy? Yup!
pi@raspberrypi:~/scripts $ sudo modprobe i2c_dev pi@raspberrypi:~/scripts $ lsmod | grep i2c i2c_dev 7171 0 i2c_bcm2835 7818 0 pi@raspberrypi:~/scripts $ ls -l /dev/i2c-1 crw-rw---- 1 root i2c 89, 1 Mar 8 00:18 /dev/i2c-1 pi@raspberrypi:~/scripts $ ./getBMEtemp.sh 67
I added it to /etc/modules and rebooted to verify that was all that was needed and it worked perfectly.
After this, the LCD interface shows the temperature and everything is good. I hope that someday someone is having trouble with smbus not working as expected and they find this post that shows them how to troubleshoot the issue and get it resolved. It’ll be like the opposite of xkcd: Wisdom of the Ancients
@hestia_hacker wow. This is great news and very thorough troubleshooting!
OK, I have an update. It’s a mixed bag.
On the plus side, I have gotten the build process to work and split it up into jobs that take no longer than 3 hours each. This was required by gitlab.com, as their CI limits each job to 3 hours regardless of how many CI minutes are left of your account. This means we can build the 1.2-dev image for jessie, stretch, or buster. The point of going through all of this work was to have a process to reliably create a clean disk image, and we do have that.
The bad news is that I’ve since learned about gitlab.com’s 1GB limit on artifacts, which is not something that can be easily worked around. This limit also applies to all subscription levels, so having the Ultimate package doesn’t help here either. I’ve contacted the support team and they have confirmed there’s no way to increase it.
I’d really like us to have an automated build process, with the resulting image on a publicly accessible server as well so users could just go there download the latest image. At the same time, anyone who wanted to contribute to the build process, could create a branch and use this same infrastructure so we could test each others experimental builds.
At this point, I’m going to ask some friends if they have any CI/CD solutions that they might be willing to let us use (for free) to get what we actually want. It’s for a good cause, so hopefully someone will be willing to take on the cause.
That’s great news! I’ve pinned this topic for a month to get maximum attention in the forum in case someone here can help.
Are the any shareable image files from your work we could have a look and test as a community?
Here’s a happy image:
I realized that I forgot to clear out ~/.ssh/authorized_keys file, so I’ll need to update the scripts to do that. I’ll also probably make the very last step to clean up anything that might be left over in the home directories that shouldn’t be there and then fill the free space on the disk with zeros (so it’ll compress better).
There’s likely to be some other little things that need to be done, but hopefully this revision is at least functional.
Hopefully we’ll find a host because Mega only gives me 20G of free storage, and that just used about 6 of them.
I did try the buster image and it looks like openHAB, sensor and relays work but the LCD doesn’t. LCD on Stretch image on the other hand, did work. Any idea from the top of your head @hestia_hacker why that might be? Everything seemed ok but I didn’t do a thorough test in case the image will need modifications…
The docs on touch screen support only seem to talk about the kind that attach via the ribbon cable.
I wouldn’t be surprised if the LCD-show package goes away from raspberrypi.com/Downloads at some point as it appears to be about 5 years old. I found a repo on GitHub that might be that same LCS-show package. That was updated a month ago, so it seems like trying the latest version is a logical first step.
I’ll try to reproduce your results and then see if I can get it working on Buster. Glad to hear that stretch passed a smoke test. Spring and Fall are a great time to take my thermostat offline.
Got the LCD to display and touch screen register points correctly (although not 100% precise). I’m not entirely sure where my PR should go and how to actually test it as I’m not familiar with the CI/CD environment so please excuse me for pasting below the commands I entered on the Buster image:
sudo rm -rf LCD-show git clone https://github.com/goodtft/LCD-show.git chmod -R 755 LCD-show cd LCD-show/ # We don't want to reboot right now, so we'll patch that part out sed -i 's/^sudo reboot/#sudo reboot/' ./LCD35-show sudo ./LCD35-show
As that last
sudo ./LCD35-show removes our commands from autostart, we need to readd them
grep raspberry-pi-turnkey /etc/rc.local || \ (grep -v "exit 0" /etc/rc.local; echo "su pi -c '/usr/bin/sudo /usr/bin/python3 /home/pi/scripts/raspberry-pi-turnkey/startup.py &'"; echo "su -l pi -c 'sudo xinit /home/pi/scripts/kiosk-xinit.sh'"; echo "exit 0") > rc.local && sudo mv rc.local /etc/rc.local sudo chmod +x /etc/rc.local
Replace contents of
with below block
Section "InputClass" Identifier "calibration" MatchProduct "ADS7846 Touchscreen" Option "Calibration" "3934 252 1298 3563" EndSection
Next thing I noticed missing is the info on the countdown-loading screen (IP and MAC). I have an idea I’ll check another time.
Then next big stop is upgrading to latest openHAB and all our configuration, rules etc… Rich had some good guidelines for that.
I put your change in the build scripts and built an image. When I put it on an SD card and booted it, I was able to get to the “connect to the AP and enter your creds” page and that seemed to work. When it rebooted, I got to a screen where the left side just said “OFF” and the flame, snowflake and fan icons didn’t do anything.
The info button did work though, and I saw it was successfully getting on the WiFi. I connected to port 8080 and got a 403-Forbidden error from Jetty. I waited a bit longer and got the same error. I don’t have time to look into it further right now because our guests will be arriving shortly, but I’ll report back when I dig in and figure out what’s failing.
Your last commit on
hestiapi.sh does not include the calibration block from above.
Please change the 4 numbers on line 165.
That should make all LCD areas work.
Please allow 10-15 minutes on the very first run before visiting the web UI or a phone App. Keep in mind that the webUI keeps some connection open and retries to connect in the background, so if you had a tab open on your browser from a previous successful boot, don’t just leave it open.
Did a one character edit on kiosk-xinit.sh to fix that.
Sooo Buster is running fine… next stop upgrade openHAB!
@rlkoshak are your instructions still valid with today’s OH version (coming from 2.5.12) ?
I don’t know. Which instructions in specific are you referring to here? I don’t think I wrote any upgrade instructions yet and I’m certain there will be some necessary changes to the config. The rules should pretty much work as written but OH 3 completely dropped support for 1.x version bindings. IIRC when I reworked the rules we were still using the 1.x GPIO binding (because there wasn’t a 2.x version binding yet). There is now a 2.x style GPIO binding but that means working with Things now instead of Item configs.
So the PinXX Items would need to be reworked (probably best to make the managed like all the rest) and of course the new binding would need to be installed and a Thing created and configured.
Beyond that, I think everything else is OH 3 ready.
There are also a ton of new features that we can use to greatly reduce the complexity and amount of rules code including profiles and Units of Measurement (we can get rid of everything that deals with converting between degrees F and degrees C for example), the semantic model, etc. But all that can and probably should wait until what we have now is working on OH 3.
At a high level the process would be as follows:
Use apt to install the latest OH 3 release. The installer will make all the changes necessary to migrate the managed configs (Thing, Items, Rules) to OH 3.
Install the new GPIO binding.
Use MainUI (replacement to PaperUI) to create a new GPIO Thing and configure the channels for the four pins we use.
Recreate the PinXX Items using MainUI.
I’m reasonably confident that will work. Once that’s working on OH 3, it’s probably worth while to look into (in no particular order):
retrofit our Items into the semantic model
configure the Overview page in MainUI and consider removing BasicUI and the sitemap for admin and phone control
rework the Items and shell scripts to use Units of Measurement (we can standardize on degrees C everywhere and only convert to degrees F on the UI and the MQTT messages we publish to the UI), we can get rid of a bunch of Items doing this as well
rewrite the rules using the JS Scripting add-on which will let us use ECMAScript 11 instead of the ancient ECMAScript 5.1 we are using now (which will go away as a default when OH moves from Java 11 to Java 17 probably some time this year). This also comes with a helper library that provides something close to a pure Java Script environment. This too will allow for a good deal of rules simplification.
move some of the rules config stuff out of defaults.js into Items so they can more easily be configured by end users where necessary (MainUI now has widgets to enter all sorts of stuff including free text and date times that can’t be done in sitemaps)
look into adopting Timeline - UI Widgets - openHAB Community (or something like it) to set the schedule
With the helper library that comes with the JS Scripting add-on, it’s really easy now for rules to call each other and to work with Item metadata. We should be able to use that to not only further simplify rules but also eliminate a whole bunch of Items.
The sensor on my test unit gave up and I’ve unplugged it. Next time I get a chance I’ll see if I can bring it back out and ruin the upgrade on it to verify this but it might take a bit so don’t wait on me.
Thank you @rlkoshak, clear as always. You are right, there were no upgrade instructions…
Will look into these the following days…
@hestia_hacker I recall the load before installing the LCD script was reasonable but now I noticed this:
Saw some open issues on LCD and framebuffer repos that may be related…
Although performance-wise, it feels fine, we should not just leave it like this.
Will double check.
Regarding the upgrade steps from rlkoshak, how would these be described/applied to your scripts?