Daemonizing drivers, Python, MQTT and web scraping

I’ve been doing some excursions during the last 12 months to figure out what would be the best way to replace my Windows-based Home Automation system. It has become rather large through the years and I want to get rid of that single point of failure, being that Windows executable that does it all. A few weeks ago I tried some things with a Raspberry Pi (RPi) and Python and was actually very much surprised by the speed with which I could build something from scratch in a matter of hours.

FEZ Panda II, Simplecortex are both nice and may be able to do the job just as well, but the RPi is just awesome compared to those 2 for  reasons I think I don’t have to explain.

Yesterday afternoon was the first time (after a long, long week full of all kinds of problems) that I could sit down behind my PC again and spend some time on things I like to do. That afternoon I tried to get a Python script to run as a daemon. Running a Python script from the command-line is not very useful for me (only for testing & debugging) cause what I really want is that the RPi starts all the tasks I want it to run on it automatically at boot time – just power-up the RPi and the rest goes automagically, no user interaction should be needed. So I decided to combine 3 subjects into this small experiment: a Python MQTT client, daemonizing Python scripts and web scraping; lets see how far I can get.

I started with a brand new RPi, SD card and new image written to it. Did the usual things like setting the hostname, time zone etcetera. After that I installed some Python related tools (e.g. installing modules) to make life easier. An MQTT client written in Python wasn’t very hard to find – Mosquitto (the OS MQTT server that I’m using since September last year) has one, so I installed that one too. I searched for info on how to create a daemon and found the python-daemon library. The last things I needed were some tools to make the process of web scraping somewhat more comfortable: regular expressions and requests. I found some examples on how to use all these libraries/tools mentioned above and started coding.

Well, is this something you can really call coding? Reading and understanding the examples I found, copying code snippets, adding some extra lines, deleting others – it feels more like blending actually 😉

I use web scraping to show various kinds of information on the User interfaces in our house – things like the amount and total length of traffic jams, a 2-hour rain forecast (BBQ!) for our specific location and stuff like that, so it looked like a good idea to have a look if I could make one of those scrapers run in a Python-based daemon on the RPi.

3 hours later I was finished, with this code as a result:

import logging
import time
import mosquitto
import requests
import re
from daemon import runner

class App():

    def __init__(self):
        self.stdin_path = '/dev/null'
        self.stdout_path = '/dev/tty'
        self.stderr_path = '/dev/tty'
        self.pidfile_path =  '/var/run/webfetcherd.pid'
        self.pidfile_timeout = 5

    def on_connect(self, mosq, obj, rc):
        logger.info("on_connect:"+str(rc))

    def on_message(self, mosq, obj, msg):
        logger.info(msg.topic+" "+msg.payload)

    def de_html(self, html):
        pattern = re.compile("<.*?>|&nbsp;|&amp;",re.DOTALL|re.M)
        return pattern.sub("",html)

    def run(self):
        while True:
            self.mqttc = mosquitto.Mosquitto("webfetcher")
            self.mqttc.on_message = self.on_message
            self.mqttc.on_connect = self.on_connect
            self.mqttc.connect("192.168.10.40", 1883, 60)
            rc = 0
            prvtime = 0.0
            while rc == 0:
                rc = self.mqttc.loop()
                if (time.time()-prvtime) > 60:
                    prvtime = time.time()
                    rq = requests.get('http://m.fileindex.nl/files.js')
                    rex = re.compile('"(.*)"')
                    m = rex.search(rq.text)
                    if m:
                        res = self.de_html(m.group())
                        logger.info('Match: %s', res)
                        self.mqttc.publish("test/trafficjams",str(res))
                    else:
                        logger.info('NO match: %s', rq.text)

logger = logging.getLogger("Webfetcher")
logger.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(messa
handler = logging.FileHandler("./webfetcher.log")
handler.setFormatter(formatter)
logger.addHandler(handler)
app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.daemon_context.files_preserve=[handler.stream]
daemon_runner.do_action()

Starting the daemon
Done! Ok, not completely – I didn’t make an init script yet and I don’t really like the way the MQTT client is called (line 36) and there are some other things that can probably be done better, especially when there’s I/O involved with hardware connected to the RPi; but for this goal and as result for a first alpha version this is OK I guess..

On to alpha2!

Tagged , . Bookmark the permalink.

4 Responses to Daemonizing drivers, Python, MQTT and web scraping

  1. Xose says:

    Hi!
    I’ve been playing with mosquitto, python and MQTT for the last few months. My setup is very much the same you want. So far all my sensors have an XBee so the core of my software is a xbee2mqtt daemon that receives, parses and publishes the messages from the different radios. Then, a bunch of other daemons use that information (mqtt2mysql, mqtt2cosm,…).
    I think you may find it interesting to take a look at the code and my blog…

    • Hi Xose,
      Thanks for pointing me in the right direction, your xbee2mqtt code can be very useful for me to learn more, especially since my Python experience is still < 10 hours 😉 And funny to hear we have a lot of things in common too: sensors with XBees, MQTT, and now Python... I'm gonna spend some time reading your blog, thanks!

  2. Rene Klootwijk says:

    I would go for node.js instead of Python. I ventured Python in combination with the Twisted asynchronous framework but found node.js to be a better fit for event based application development.

    • I know what you mean and I know your totally into node.js right now 😉 And I agree, the scripts looks a bit ‘odd’ when you’re used to an event driven environment. And Twisted seemed like overkill for this kind of small jobs. But don’t worry, node.js will be revisited in the near future, it’s high on my to-do list 😉

Leave a Reply

Your email address will not be published. Required fields are marked *