Manual install? I want to keep some other things running on my R Pi

I read the source code for turnkey and found some clues in /home/pi/scripts/raspberry-pi-turnkey/wpa/wpa.log:

Successfully initialized wpa_supplicant
wlan0: CTRL-EVENT-SCAN-FAILED ret=-512 retry=1
nl80211: deinit ifname=p2p-dev-wlan0 disabled_11b_rates=0
p2p-dev-wlan0: CTRL-EVENT-TERMINATING
nl80211: deinit ifname=wlan0 disabled_11b_rates=0
wlan0: CTRL-EVENT-TERMINATING

In /home/pi/scripts/raspberry-pi-turnkey/wpa/test.conf I see the ESSID that I selected via the web UI, there’s a commented out line with my password: #psk="hunter2" followed by what looks like an encrypted of hashed password psk=6229...3a91.

Tracing the turnkey source code, this is the output of wpa_passphrase, which is something I had never heard of before, but based on the man page, it seems legit. Next in the source I see that there’s a hardcoded wireless interface name “wlan0” but that matches up with my working hestiapi system.

Then I took a different, smarter approach. I logged into my production hestiapi and did a git diff on turnkey in the ~/scripts/ directory. It turned out, I missed setting an important variable. I fixed this up and then booted and voila! Turnkey could get on the wifi.

So I’ve pushed that fix up to kick off a new build. However, I’m not through yet. I’m not getting the flame/snowflake/fan icons, the UI just remains on the “Off” screen. The info still doesn’t have any idea what IP address the thermostat has, but it did show it on the screen when it was booting up, so I was able to SSH into it and poke around. Openhab2 was running, and I didn’t see anything wrong in journalctl -e other than hostapd.service not being able to start (I’d expect that to be disabled by turnkey, but apparently not). hostapd wasn’t actually causing any network problems, so I just stopped it manually for now.

It’s cold, so I’m going to swap back to my working SD card for now. It might be a week or two before I’m able to get back to troubleshooting this, as life is getting between me and debugging my thermostat. :upside_down_face: When I do come back to this, I plan on looking at openhab.log and hopefully that’ll have some clues because that’s pretty much the only other place I can think of to look.

The IP on the boot screen comes from a bash script. The IP on the info screen comes from openHAB via MQTT on localhost.

Indeed it should not try to start. Is it throwing any error/reason?

openHAB does not report (I think) internal issues. Try openhab-cli showlogs

Here’s the error logs for hostapd.service:

Jan 19 23:26:43 raspberrypi systemd[1]: hostapd.service: Service hold-off time over, scheduling restart.
Jan 19 23:26:43 raspberrypi systemd[1]: Stopped Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator.
Jan 19 23:26:43 raspberrypi systemd[1]: Starting Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator...
Jan 19 23:26:43 raspberrypi hostapd[8751]: Configuration file:
Jan 19 23:26:43 raspberrypi hostapd[8751]: Could not open configuration file '' for reading.
Jan 19 23:26:43 raspberrypi hostapd[8751]: Failed to set up interface with
Jan 19 23:26:43 raspberrypi hostapd[8751]: Failed to initialize interface
Jan 19 23:26:43 raspberrypi systemd[1]: hostapd.service: Control process exited, code=exited status=1
Jan 19 23:26:43 raspberrypi systemd[1]: Failed to start Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator.
Jan 19 23:26:43 raspberrypi systemd[1]: hostapd.service: Unit entered failed state.
Jan 19 23:26:43 raspberrypi systemd[1]: hostapd.service: Failed with result 'exit-code'.

For OpenHAB, here’s the initial error:

2022-01-03 08:26:23.280 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model 'default.rules' is either empty or cannot be 
parsed correctly

I found that /etc/openhab/rules/default.rules was an empty file, so I copied the contents of that file over and restarted OpenHAB and then I got the same error. Here are the logs after the restart:

2022-01-10 05:15:52.227 [INFO ] [io.openhabcloud.internal.CloudClient] - Shutting down openHAB Cloud service connection
2022-01-10 05:15:52.536 [INFO ] [b.core.service.AbstractActiveService] - HTTP Refresh Service has been shut down
2022-01-10 05:15:54.115 [INFO ] [basic.internal.servlet.WebAppServlet] - Stopped Basic UI
2022-01-10 05:16:16.902 [INFO ] [openhab.ui.paper.internal.PaperUIApp] - Stopped Paper UI
2022-01-10 05:16:17.050 [INFO ] [.dashboard.internal.DashboardService] - Stopped Dashboard
2022-01-19 22:19:17.700 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'default.items'
2022-01-19 22:19:22.334 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'mapdb.persist'
2022-01-19 22:19:31.034 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'default.sitemap'
2022-01-19 22:20:08.701 [INFO ] [thome.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
2022-01-19 22:20:23.174 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'default.rules'
2022-01-19 22:20:23.527 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model 'default.rules' is either empty or cannot be 
parsed correctly!
2022-01-19 22:20:50.702 [INFO ] [.dashboard.internal.DashboardService] - Started Dashboard at http://172.16.0.104:8080
2022-01-19 22:20:50.746 [INFO ] [.dashboard.internal.DashboardService] - Started Dashboard at https://172.16.0.104:8443
2022-01-19 22:22:02.865 [INFO ] [b.core.service.AbstractActiveService] - HTTP Refresh Service has been started
2022-01-19 22:22:11.541 [WARN ] [core.thing.internal.ThingManagerImpl] - Initializing handler for thing 'mqtt:broker:mosquitto' takes more
 than 5000ms.
2022-01-19 22:22:33.872 [INFO ] [.transport.mqtt.MqttBrokerConnection] - Starting MQTT broker connection to 'localhost' with clientid mosq
uitto
2022-01-19 22:23:02.386 [INFO ] [openhab.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui
2022-01-19 22:23:23.468 [ERROR] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance: not authorized
2022-01-19 23:31:58.432 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'default.rules'
2022-01-19 23:32:00.249 [WARN ] [g.mqtt.handler.AbstractBrokerHandler] - Tried to unsubscribe org.openhab.binding.mqtt.homie.internal.disc
overy.Homie300Discovery@1bb0aef from  discovery topic homeassistant/# on broker mqtt:broker:mosquitto but topic not registered for listene
r. Check discovery logic!
2022-01-19 23:32:00.568 [WARN ] [g.mqtt.handler.AbstractBrokerHandler] - Tried to unsubscribe org.openhab.binding.mqtt.homeassistant.inter
nal.discovery.HomeAssistantDiscovery@1f4ba9 from  discovery topic +/+/$homie on broker mqtt:broker:mosquitto but topic not registered at a
ll. Check discovery logic!
2022-01-19 23:32:00.599 [WARN ] [g.mqtt.handler.AbstractBrokerHandler] - Tried to unsubscribe org.openhab.binding.mqtt.homeassistant.inter
nal.discovery.HomeAssistantDiscovery@1f4ba9 from  discovery topic homeassistant/# on broker mqtt:broker:mosquitto but topic not registered
 at all. Check discovery logic!
2022-01-19 23:32:00.628 [WARN ] [g.mqtt.handler.AbstractBrokerHandler] - Tried to unsubscribe org.openhab.binding.mqtt.homeassistant.inter
nal.discovery.HomeAssistantDiscovery@1f4ba9 from  discovery topic +/+/$homie on broker mqtt:broker:mosquitto but topic not registered at a
ll. Check discovery logic!
2022-01-19 23:32:09.564 [WARN ] [el.core.internal.ModelRepositoryImpl] - Ignoring file 'default.rules' as we do not have a parser for it.
2022-01-19 23:32:11.188 [INFO ] [io.openhabcloud.internal.CloudClient] - Shutting down openHAB Cloud service connection
2022-01-19 23:32:11.282 [INFO ] [b.core.service.AbstractActiveService] - HTTP Refresh Service has been shut down
2022-01-19 23:32:13.326 [INFO ] [basic.internal.servlet.WebAppServlet] - Stopped Basic UI

What step in the manual install process should be populating the default rules file? That might be a clue as to where things are going wrong for me. I can upload the image for others to look at, but it’s currently about 4GB, so I’ll need some place to upload it that will allow such file sizes.

I believe the rules file was kept empty till openHAB was fully loaded and then the file was copied. Keep in mind that there are 2 places where rules are stored. First is the DB that you can access from PaperUI and second is the file you mention and is empty (moved/hidden) at the beginning till the Initialization rule prepares OH. Google Drive, although I’m not a big fan, allows such files.

If I recall correctly, and I’ve been steeped in OH 3.x for over a year now, that error is not unusual. Every time a file is changed there are two events, one with the file starts to change and one when it completes. Unfortunately both events trigger OH to try to read the file and the file is basically empty on the first event. So you’d see this error and then it’d be followed by a successful load of the file.

If you see "Loading model ‘default.rules’ instead of another error, you know that it loaded correctly.

However, the “Ignoring file ‘defaul.rules’ as we do not have a parser for it” message is a new one for me. That implies that there is something wrong with the rules engine, but that’s just there by default so I’m not sure what would cause that error. Did you kill OH immediately after seeing that other error? That could account for it. The rule engine was killed and OH started closing down before the rules file was parsed.

Remove the .rules file from OH’s folder and restart OH. This time give it a chance to fully come up before copying the file over. There might be stuff getting in the way.

It’s not part of the install process. It’s part of the startup process in kiosk-xinit.sh if I remember correctly.

Because the RPi 0 is really under powered to run OH, and parsing a .rules file is really heavy on the
CPU, delaying that step until everything else starts up actually ended up saving a significant amount of time on the boot process which, before I started was well over five minutes.

All that this default.rules file does is set a flag (in an Item) to indicate that OH and all the rest of the HestiaPi system has finished coming online which will allow the rules that actually drive the thermostat.

The PaperUI rules (written in a dialect of ECMAScript 5.1 called Nashorn) can be found in /var/lib/openhab/jsondb. Moving most of the rules out of .rules files to this saved minutes on the boot time.

Winter is a bad time to be taking one’s thermostat offline, so progress continues to be slow, but I did some static analysis and found that on a working HestiaPi, the hostapd systemd service is “active (exited)”. I wanted to understand why it exited cleanly on the v1.2 image but didn’t do so on my self-built image.

Systemd lists /etc/init.d/hostapd as the module it’s loading. I compared that file on each of the two image, and it’s identical, as is /etc/defaults/hostapd. There’s one line that sticks out to me here, which is this:

[ -n "$DAEMON_CONF" ] || exit 0

The -n check is “True if the length of string is nonzero.”. DAEMON_CONF is set to an empty value on line 19, which explains why the production hestiapi is exiting cleanly. It does not explain why my build does not exit cleanly.

So I did what any good hacker would do: grabbed an extra hoodie and turned off the heat so I could test a new image on my open source thermostat. :grin:

Sure enough, the hostapd service on my image is NOT /etc/init.d/hostapd, but rather /lib/systemd/system/hostapd.service, which has no such checks for whether DAEMON_CONF is empty or not (in fact, DAEMON_CONF isn’t even a thing in my .service file). So that solves that mystery.

Masking the hostapd service, removing the hostapd.service file in /lib/systemd/system and then unmasking it caused the correct service file to be picked up and starting the service acted just like the production hestiapi. That should resolve the hostapd issue.

Yeap… that’s the way :stuck_out_tongue:
well done!

After fixing the hostapd service, openhab seems to inexplicably be in a better place. I now get the flame/snowflake/fan icons and the temperature set point is showing with the default value of 70F.

While better, it’s still not up to the bar of “functional” yet, as the temperature and humidity are both showing up as zeros. In the logs, there’s a clue…

2022-01-29 19:31:58.864 [WARN ] [.transport.mqtt.MqttBrokerConnection] - Failed subscribing to topic hestia/external/+/humidity
java.util.concurrent.CompletionException: com.hivemq.client.mqtt.exceptions.MqttSessionExpiredException: Session expired as connection was closed.
...
2022-01-29 19:40:09.422 [INFO ] [marthome.model.script.initialization] - System is ready to operate, kicking off restored behaviors
2022-01-29 19:40:10.893 [ERROR] [e.automation.internal.RuleEngineImpl] - Failed to execute rule '32223121-5acf-423f-a9f5-1dffbe665927': Fail to execute action: 2

Of course, the 32223121-5acf-423f-a9f5-1dffbe665927 rule! Why didn’t I think of that? Thanks to @rlkoshak’s tips on where I can fidn the rules file, I was able to grep my way to finding the culprit. It’s in /var/lib/openhab2/jsondb/automation_rules.json and the script is… large, but it’s the one labeled “Initialize all the setting Items” (in case that UUID was generated at runtime).

I found this rule in the PaperUI and ran it there so I could see what shows up in the logs. Here’s what I got:

2022-01-29 19:51:02.567 [INFO ] [marthome.model.script.initialization] - Comfort_Mode is COMFORT, Default value is 1
2022-01-29 19:51:02.881 [INFO ] [marthome.model.script.initialization] - Comfort_Value is 1, initializing to 1
2022-01-29 19:51:04.983 [INFO ] [marthome.model.script.initialization] - System is ready to operate, kicking off restored behaviors
2022-01-29 19:51:05.936 [WARN ] [e.automation.internal.RuleEngineImpl] - Fail to execute action: 2
java.lang.IllegalArgumentException: The argument 'command' must not be null.
        at org.eclipse.smarthome.core.events.AbstractEventFactory.checkNotNull(AbstractEventFactory.java:117) ~[?:?]
        at org.eclipse.smarthome.core.items.events.ItemEventFactory.assertValidArguments(ItemEventFactory.java:390) ~[?:?]
        at org.eclipse.smarthome.core.items.events.ItemEventFactory.createCommandEvent(ItemEventFactory.java:222) ~[?:?]
        at org.eclipse.smarthome.core.items.events.ItemEventFactory.createCommandEvent(ItemEventFactory.java:238) ~[?:?]
        at org.openhab.core.automation.module.script.internal.defaultscope.ScriptBusEvent.sendCommand(ScriptBusEvent.java:94) ~[?:?]

The code right after the Comfort_Mode and Comfort_Value gets printed to the logs is related to the humidity, which lines up with the error seen earlier.

def = DEFAULTS.get("Humi_DEF");
min = DEFAULTS.get("Humi_MIN");
max = DEFAULTS.get("Humi_MAX");
initSetpoint("HumiSetpoint", 50, 100, 0);

I’m not really sure where to go from here. I know that the hardware in this pi is working correctly, because everything works fine when I switch back to v1.2. Any ideas on what’s going on here or how I can debug further?

The temp/humi values are obtained by executing a script. For OH to be able to call a script the “Exec” binding had to be installed. Can you make sure this is installed and functioning ok?
I also remember there was an extra security need that each script had to be included in a whitelist but this should not be needed anymore (HestaPi stuck at 0°-screen (no. 15 of First boot ONE-tutorial) - #16 by rlkoshak).

Scripts to pull data were not run properly. TL;DR the i2c_dev kernel module wasn’t loaded.

pi@raspberrypi:~/scripts $ ./getBMEtemp.sh 
Traceback (most recent call last):
  File "/home/pi/scripts/bme280.py", line 30, in <module>
    bus = smbus.SMBus(0) # Rev 2 Pi, Pi 2 & Pi 3 uses bus 1
IOError: [Errno 2] No such file or directory

It’s not an ImportError, but I checked to see if I had the library anyway. I do.

pi@raspberrypi:~/scripts $ apt list python*smbus*
Listing... Done
python-smbus/oldoldstable,now 3.1.2-3 armhf [installed]
python3-smbus/oldoldstable 3.1.2-3 armhf

I took a look at the library and I see that it appears to be using FFI to call into native code (the shared object file).

pi@raspberrypi:~/scripts $ dpkg -L python-smbus
/.
/usr
/usr/lib
/usr/lib/python2.7
/usr/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages/smbus.arm-linux-gnueabihf.so
/usr/share
/usr/share/doc
/usr/share/doc/python-smbus
/usr/share/doc/python-smbus/changelog.Debian.gz
/usr/share/doc/python-smbus/changelog.gz
/usr/share/doc/python-smbus/copyright

I thought “but Hestia, how is this any different than the dev 1.2 image, which works just fine?” Great question, Hestia, glad you asked. That one is using an older version of the library, 3.1.1+svn-2 instead of 3.1.2-3.

Now, why would a newer version appear to be completely broken, I have no idea. Lets keep digging, shall we? I ran the python script under strace -f and found that at some point it was trying to open /dev/i2c-1 and failing. I looked in /dev and found that there wasn’t any i2c* devices in there. Comparing that to a pi running the 1.2-dev image and I could see that this device file should exist.

Awesome. Looking at the kernel modules reveals that I do have i2c_bcm2835 on my image, but not i2c_dev. Looking back at the manual installation instructions, I don’t see anything about making sure this kernel module is installed or loaded.

OK, fine. “Do we even have this kernel module?”, I wondered. I took a long shot and figured I’d just blindly try to load it without checking to see if it exists first. What could go wrong, right? I ran modprobe and… no output. Could it have been that that easy? Yup!

pi@raspberrypi:~/scripts $ sudo modprobe i2c_dev
pi@raspberrypi:~/scripts $ lsmod | grep i2c
i2c_dev                 7171  0
i2c_bcm2835             7818  0
pi@raspberrypi:~/scripts $ ls -l /dev/i2c-1
crw-rw---- 1 root i2c 89, 1 Mar  8 00:18 /dev/i2c-1
pi@raspberrypi:~/scripts $ ./getBMEtemp.sh 
67

I added it to /etc/modules and rebooted to verify that was all that was needed and it worked perfectly.

After this, the LCD interface shows the temperature and everything is good. I hope that someday someone is having trouble with smbus not working as expected and they find this post that shows them how to troubleshoot the issue and get it resolved. It’ll be like the opposite of xkcd: Wisdom of the Ancients :rofl:

1 Like

@hestia_hacker wow. This is great news and very thorough troubleshooting!

OK, I have an update. It’s a mixed bag.

On the plus side, I have gotten the build process to work and split it up into jobs that take no longer than 3 hours each. This was required by gitlab.com, as their CI limits each job to 3 hours regardless of how many CI minutes are left of your account. This means we can build the 1.2-dev image for jessie, stretch, or buster. The point of going through all of this work was to have a process to reliably create a clean disk image, and we do have that.

The bad news is that I’ve since learned about gitlab.com’s 1GB limit on artifacts, which is not something that can be easily worked around. This limit also applies to all subscription levels, so having the Ultimate package doesn’t help here either. I’ve contacted the support team and they have confirmed there’s no way to increase it.

I’d really like us to have an automated build process, with the resulting image on a publicly accessible server as well so users could just go there download the latest image. At the same time, anyone who wanted to contribute to the build process, could create a branch and use this same infrastructure so we could test each others experimental builds.

At this point, I’m going to ask some friends if they have any CI/CD solutions that they might be willing to let us use (for free) to get what we actually want. It’s for a good cause, so hopefully someone will be willing to take on the cause.

1 Like

That’s great news! I’ve pinned this topic for a month to get maximum attention in the forum in case someone here can help.

Are the any shareable image files from your work we could have a look and test as a community?

Here’s a happy image:

I realized that I forgot to clear out ~/.ssh/authorized_keys file, so I’ll need to update the scripts to do that. I’ll also probably make the very last step to clean up anything that might be left over in the home directories that shouldn’t be there and then fill the free space on the disk with zeros (so it’ll compress better).

There’s likely to be some other little things that need to be done, but hopefully this revision is at least functional.

Hopefully we’ll find a host because Mega only gives me 20G of free storage, and that just used about 6 of them. :laughing:

I did try the buster image and it looks like openHAB, sensor and relays work but the LCD doesn’t. LCD on Stretch image on the other hand, did work. Any idea from the top of your head @hestia_hacker why that might be? Everything seemed ok but I didn’t do a thorough test in case the image will need modifications…

I’m not sure off the top of my head. There’s a jessie specific hack related to the wifi, but the section about the LCD software looks like it’s all stock sutff.

The docs on touch screen support only seem to talk about the kind that attach via the ribbon cable.

I wouldn’t be surprised if the LCD-show package goes away from raspberrypi.com/Downloads at some point as it appears to be about 5 years old. I found a repo on GitHub that might be that same LCS-show package. That was updated a month ago, so it seems like trying the latest version is a logical first step.

I’ll try to reproduce your results and then see if I can get it working on Buster. Glad to hear that stretch passed a smoke test. Spring and Fall are a great time to take my thermostat offline. :laughing:

Got the LCD to display and touch screen register points correctly (although not 100% precise). I’m not entirely sure where my PR should go and how to actually test it as I’m not familiar with the CI/CD environment so please excuse me for pasting below the commands I entered on the Buster image:

sudo rm -rf LCD-show
git clone https://github.com/goodtft/LCD-show.git
chmod -R 755 LCD-show
cd LCD-show/
# We don't want to reboot right now, so we'll patch that part out 
sed -i 's/^sudo reboot/#sudo reboot/' ./LCD35-show
sudo ./LCD35-show

As that last sudo ./LCD35-show removes our commands from autostart, we need to readd them

grep raspberry-pi-turnkey /etc/rc.local || \
(grep -v "exit 0" /etc/rc.local; echo "su pi -c '/usr/bin/sudo /usr/bin/python3 /home/pi/scripts/raspberry-pi-turnkey/startup.py &'"; echo "su -l pi -c 'sudo xinit /home/pi/scripts/kiosk-xinit.sh'"; echo "exit 0") > rc.local && sudo mv rc.local /etc/rc.local
sudo chmod +x /etc/rc.local

Replace contents of /etc/X11/xorg.conf.d/99-calibration.conf
with below block

Section "InputClass"
        Identifier      "calibration"
        MatchProduct    "ADS7846 Touchscreen"
        Option  "Calibration"   "3934 252 1298 3563"
EndSection

And reboot…

Next thing I noticed missing is the info on the countdown-loading screen (IP and MAC). I have an idea I’ll check another time.
Then next big stop is upgrading to latest openHAB and all our configuration, rules etc… Rich had some good guidelines for that.

I put your change in the build scripts and built an image. When I put it on an SD card and booted it, I was able to get to the “connect to the AP and enter your creds” page and that seemed to work. When it rebooted, I got to a screen where the left side just said “OFF” and the flame, snowflake and fan icons didn’t do anything.

The info button did work though, and I saw it was successfully getting on the WiFi. I connected to port 8080 and got a 403-Forbidden error from Jetty. I waited a bit longer and got the same error. I don’t have time to look into it further right now because our guests will be arriving shortly, but I’ll report back when I dig in and figure out what’s failing.