Opentherm Gateway operational

Opentherm Gateway and SimpleCortex side by sideToday I finished the monitoring part of the OpenTherm Gateway. The code running on the Simplecortex had proven to be reliable enough to start using it, so why not do just that 🙂

The OpenTherm Gateway you see on the right is my older one, built earlier this year and will be replaced by the new one once it’s finished.

Before I could start on the OpenTherm Gateway, I had to fix some issues which were the result of what I did last week. The MQTT driver was not as stable as I hoped; it worked fine for several hours, but it never made it longer than 48 hours, so I rearranged some code and it has been running fine since then.

Another thing I had to fix was that the Touchscreen, which was transformed to an MQTT client, was disconnected from the broker by the time I came home from work. The reason: the Touchscreen goes into hibernation when nobody’s at home to save energy, which leads to the broker closing the connection! Didn’t think of that one.. before, the touchscreen was a UDP listener, in which case there is no real connection… so I added a timer to the Touchscreen application that periodically checks the connection and re-establishes it when needed.

Back to the OT Gateway. The largest part of the work that still had to be done was preparing my Domotica system. The MQTT topics were already being received, but I had to do something about the payloads. The Opentherm Gateway produces a lot of bit values like for instance CH mode (Central Heating mode); payload ‘0’ means not active, ‘1’ means active. Well, I’m not going to pester users with terms like ‘ch_mode 1’ or ‘dhw_mode 0′; Central Heating mode active’ or ‘Hot water not active’ sounds more like it, right… But again, one of my goals is not to put texts like those hard-coded in any piece of code – not on the Simplecortex which publishes the information, nor in a Touchscreen app, or webpage or whatever.

So I felt really lucky that I started thinking about those things a long time ago and introduced so-called ‘Informationtypes‘ in my system in a very early stage. In a few words, an informationtype can transform a value that represents whatever to something a human can understand. For example with a door sensor, ‘0’ will be translated to ‘closed’ and 1 becomes ‘open’; but the same ‘0’ becomes ‘off‘ in the case of an appliance module. My Domotica system already does that for me and exposes both the ‘raw’ values (0/1 in case of booleans, unformatted data in case of numeric values, etc.) as the ‘formatted’ values (on/off, open/closed, floats with the right number of decimals). Raw data is nice, for computers. But not for humans! Currently, there are 150 informationtypes in my system (in the database, of course) ranging from on/off for a switch to the selected input on my AV receiver (0=”BD/DVD”, 1=”CBL/SAT”…).

So all I had to do was publishing those formatted values as well and I could present those values on a webpage and display something really meaningful … a new MQTT root hierarchy was born: /value.

And doing things this way, enabled me to reduce the code for the OpenTherm Gateway Device class to (almost) nothing more than this (the rest is all constants and type defs):

  case DataInfo.Format of
    flag8: Pin(iTopic).AsBoolean:=(Payload='1');
    u8,s8,u16,s16: Pin(iTopic).AsInteger:=StrToInt(Payload);
    f8_8: Pin(iTopic).AsFloat:=StrToFloatX(Payload);
    dowtod:begin
      { TODO : Finish }
    end;
  end;  //case

So now, when the central heating starts working, there’s (my system being the point of view) “/sensor/OTGW/flame 1” coming in and “/value/OTGW/flame Yes” going out. Now I can make a webpage without displaying meaningless numbers or having to translate those numbers locally that don’t (have to) mean anything to a human.

And here’s the result, showing parts of the information gathered with the OpenTherm Gateway.

 

Same Opentherm Gateway, different approach

Yep, I’m going to build a 3rd Opentherm Gateway 😉
The first one was built on perfboard and became a mess; I never finished it, because before I could the Opentherm Gateway PCB became available, which looked much better. The first PCB version was built somewhere in February 2012 and is in use since August after I replaced the Proliphix NT20e with a Honeywell modulating thermostat.

There’s only one thing I don’t really like that much about the Opentherm Gateway and that’s the serial interface. I like to have all my hardware network-attached, but I’d also like to keep the number of Serial to Ethernet servers to a minimum – right now, I have 6 Serial ports in the meter cabinet (PLCBUS, X10, GSM modem, etcetera) connected to a Serial to Ethernet server. In total, I think I have around 10 serial devices connected that way.

For the third Opentherm Gateway I’m going to try something else; I’m going to leave the MAX232 IC off the PCB and directly connect a Simplecortex to the TTL level output of the PIC that’s on the OT Gateway PCB. This approach has some benefits, like

  • The Opentherm Gateway becomes Ethernet enabled;
  • Programmable TTL to Ethernet conversion;
  • Less €€€;
  • Big reduction on network traffic;
  • more fun!

Yesterday I started with the Simplecortex firmware for the OT Gateway. I want the firmware to be versatile, meaning that it should not only support the Data ID’s that I see going back and forth between my thermostat and boiler. Fortunately the Opentherm Protocol is documented well enough to be able to write code for Data ID’s I’ve never seen in my life. In fact, the definition of how the firmware should deal with all the Data ID’s found in the OT protocol 2.2, comes down to the following:

typedef enum {Read, Write, Both} CommandType;
typedef enum {both, highbyte, lowbyte} ByteType;
typedef enum {flag8, u8, s8, f8_8, u16, s16, dowtod} PayloadType;

typedef struct{
	uint8_t ID;
	CommandType rw;
	ByteType whichbyte;
	PayloadType format;
	uint8_t bitpos;
	char* topic;

} OTInformation;

OTInformation OTInfos[] = {
{0x00,	Read, 	highbyte, flag8, 0, "ch_enable"},
{0x00,	Read, 	highbyte, flag8, 1, "dhw_enable"},
{0x00,	Read, 	highbyte, flag8, 2, "cooling_enable"},
{0x00,	Read, 	highbyte, flag8, 3, "otc_active"},
{0x00, 	Read, 	highbyte, flag8, 4, "ch2_enable"},
...
...
{0x7f, 	Read, 	highbyte, u8,    0,  "slaveproducttype"},
{0x7f, 	Read, 	lowbyte,  u8,    0,  "slaveproductversion"}
};

That’s all there is to it… add about 150 lines of code and the complete set of Data ID’s defined in the OT Protocol 2.2 documentation is supported – no more switch (DataID) with a long list of cases and repeating code…

But I wanted more – I really don’t need to be told that CH is still enabled twice a second; just a report of a change will do. For that I’m going to add a filter that will only report changes. That filter will reduce the network traffic immensely. And of course this Opentherm Gateway will be transformed into a MQTT Publisher, just like my smart meter. And last but not least, this implies that the Opentherm Gateway will act as a MQTT subscriber too, so that I can control the behavior of the OT Gateway and override the thermostat’s temperature setpoint.

Right now, I’m watching the OT Gateway information  on my screen as it is being published:

Opentherm Gateway publishing information

There’s still a lot to do, but considering the fact that I managed to get this far in only a few hours makes me confident that I can finish this project before it starts to get really cold.

Another good thing is that once this project is finished I can shut down the Remeha Calenta driver which has been running for 2 years.

The biggest disadvantage of that driver was that I had to constantly poll the Calenta and that it was based on a protocol specifically targeted at Remeha boilers.

So it all gets much, much better this way!

 

Opentherm Gateway statistics

I had a different post in mind (smart meter follow-up) but hey, if questions arise and I’m interested in the answers just as much, I’m flexible 😉

The question to be answered was: how ‘fresh’ is the data that is travelling from the slave (boiler) to the master (thermostat)?

Is it 30 seconds, as suggested in a comment from Maurice? Quick answer: no.

I added some code to my OT_Decoder tool to collect some statistical data about the Opentherm (OT) frames travelling back and forth and I concentrated on a single Message Type, the Read-Ack.

This message type travels from boiler to thermostat and is in fact a response from the slave to a read-request from the master and can contain values for lots of things like status flags, modulation level, return water temperature etcetera. All these different types of values have been given a so-called Data-ID; status flags = 0, modulation level =17 , return water temperature = 28, and so on. The protocol has room for 256 different Data-IDs.

So when the master sends a Read-Data request to the slave for a particular Data-ID, the slave responds with a Read-Ack frame that holds the Data value.

Here are the statistics I collected:

00 :   0,4 seconds, 4678 times,  5 changes, mintime     4
05 :  57,8 seconds,   66 times,  0 changes, mintime
11 :  59,6 seconds,   64 times,  0 changes, mintime
12 :  58,6 seconds,   65 times,  0 changes, mintime
19 :   4,2 seconds,  787 times,195 changes, mintime     3
1C :  59,5 seconds,   63 times, 55 changes, mintime    59
74 : 235,8 seconds,   16 times,  0 changes, mintime
75 : 235,5 seconds,   15 times,  1 changes, mintime
76 : 235,5 seconds,   15 times,  0 changes, mintime
77 : 235,6 seconds,   15 times,  0 changes, mintime
78 : 235,7 seconds,   15 times,  0 changes, mintime
79 : 235,5 seconds,   15 times,  0 changes, mintime
7A : 235,5 seconds,   15 times,  1 changes, mintime
7B : 235,8 seconds,   16 times,  0 changes, mintime

What does this all mean?

00 is the Data-ID in hex and 0,4 seconds is the average interval between 2 frames measured over 4678 ‘captured’ frames for this particuler DataID. In all these 4678 frames the Data-values changed to another value 5 times and the smallest interval between 2 data value changes was 4 seconds (floored).

Conclusions, assumptions?

  1. It’s better to do this analysis during the winter, where the boiler is really doing something and let the data collection run for 24 hours or so.
  2. Data-ID 00 tells me that the boiler must be the limiting factor here, cause the poll rate is much higher then the smallest interval between value changes. But it could also be that the status of the boiler really doesn’t change faster than that; each transition to another phase (going from idle to burning is not just 1 step) will take time; how much?
  3. Data-ID 1C could lead to the assumption that if the master would poll faster,  a lot of data values would show a different value compared to the previous data value.
  4. The master plays the biggest role in the ‘freshness’, cause it’s the master that dictates what data ID’s the slave has to reply with – no matter how often the slave measures its return water temperature, if the master only requests this temperature once a minute than that’s what you get…

By the way, for those who have their OT Gateway hooked up to a serial port of a Windows PC, have a look at this!

How to deal with the Opentherm Gateway

After I installed our new thermostat and put the Opentherm Gateway between the cable from thermostat to the boiler, I got things up and running really fast. I had already developed a small application during my Opentherm Monitor adventure and I could reuse most of the code to display the incoming traffic from the Opentherm Gateway.

Opentherm Gateway dataThe 3rd (B/T) and 4th column (“00000000”) is the data received from the OT (Opentherm) Gateway; the rest is added by the application.

The length of an OT frame is 8 bytes and with more than 100 frames per minute this means I’ll have to process an amount of 2.5 MB of OT data per day.

That might seem not that much, but this would mean the OT Gateway would win the gold medal as data producer in our house! (here you can see a list of all interfaces with their respective in- and outgoing streams, reset at midnight)

Well, I can start polling the OT Gateway with the PS command which will stop the OT gateway from reporting each frame, but I don’t like polling and I want to know what’s going on right away!

OK, another approach to reduce the amount of frames could be to locally filter out duplicates per DataID (that’s OT protocol talk) and only report changes; that would reduce the amount of traffic immensely; sounds like a nice job for a JeeNode; add a Digi XBee and I’m done 😉

 

Wow, not so fast…. now I’m thinking the way I used to – that’s not good. The integration of Opentherm into my system is a good exercise for my wish of transforming my system into a distributed application based on a messaging system. And that’s where I still have a lot of decisions to make.

ZeroMQ or MQTT (Mosquitto)? What should the topic schema look like? How do I keep it all maintenance free, adaptive to changes, etcetera…

I mean, I’ve seen enough examples of a single sensor publishing it’s value to a hard-coded IP address, but that’s not good enough for me – cause if you have a large system, you don’t want that. Nightmares! And this is just a simple example of the challenges I’m facing.

So I really have to think some things over very thoroughly before I proceed. Fortunately I’m not the only one struggling with these issues.

And maybe the choice for a thermostat with built-in schedule was not such a bad choice after all, cause I like to do things only once; the right way.

Maybe I just have to stop doing things for some time and start thinking a bit harder 😉

Exit Proliphix Thermostat

Time for something new…

Proliphix NT20eThe Proliphix NT20e has been in use for almost 3 years now. The Proliphix thermostat has brought me a lot of fun (integrating it into my Domotica system, working on the Homeseer Proliphix Plugin to add Celsius support) and comfort. But it’s time for a change!

Last year I built a Opentherm Gateway, because we noticed that a modulating boiler performed much better than a on/off controlled boiler -modulation made the “up’s and down’s” in temperatures disappear – the temperature became much more constant, which also made the floor heating much more comfortable than before.

However, the Honeywell Evohome set I used last year didn’t work well with the Opentherm Gateway; I could not override the temperature setpoint with the Gateway, which was the primary reason why I built it 🙁

I don’t know why it didn’t work, but it may have something to do with the EvoHome RF communication being not 100% 2-way?

Honeywell Chronotherm

So yesterday I dismantled the Proliphix and replaced it with a Honeywell Chronotherm Modulation (wired version). The thermostat cable running from the boiler to this new thermostat has been extended so I can give the Opentherm Gateway a place out of sight, connect a Serial to Ethernet server to it and remotely monitor the OpenTherm traffic as well as override the room setpoint.

To be continued…

Opentherm Monitor finished

This post will be the last one about the Opentherm Monitor. OTOH, when is something really completely finished…

Arduino Serial Monitor

I could spend some more hours on the Opentherm (OT) Monitor and in particular the sketch, but for now it’s good enough. I should add some extra code to validate the OT frame but that would also mean I won’t be able to analyze ‘strange’ frames with unknown Data ID etcetera on my PC. So I’ll leave it as-is for  now. The Opentherm Gateway is waiting 😉

From what I’ve seen during the last 24 hours, the ‘quality’ of the frames I receive is quite good; somehow there seems to be an invalid frame on the wires every minute or so, and I can’t find out what it is. This same thing happens with the Opentherm Gateway Monitor, so I think both are having the same problem. The Data ID tells me it’s probably an OEM frame…?

OpenTherm Decoder

The Opentherm Decoder running on my PC receives the 4 OT bytes from a serial port and decodes those bytes to something human readable: whether the frame came from the Thermostat or the Boiler, Message type and the meaning of the Data ID. The 16-bit data value (there where you can find the temperatures, pressure and status bits) is not decoded yet; well, it’s all in the Opentherm Protocol documentation, so that should be no problem.

Now I can use this Opentherm Monitor as an additional display near the boiler! The Remeha Calenta already has a rather large display showing stuff like status, water pressure, whether the pump is running, but it doesn’t display flow- and return temperature, control setpoint and I’m sure I can think of some more interesting stuff I wanna see – that’s what the Opentherm Monitor is going to do for me. I already have a 16×4 LCD, so all I have left to do is finding a suitable enclosure, build everything in there and I’m done!

I really liked getting this Opentherm Monitor to work without errors; in fact, getting it to work was more exciting than building it. Learning on the job about ATMega timers, Manchester decoding and programming the whole thing in C from scratch was one big adventure.

The most important references I used were:

And here‘s the sketch- no additional libraries needed, free to use and no guarantees that it will work for you just as well as it does for me. Have fun!

 

Got it!

Yes, yes, YES! I love it 🙂

The rethinking of the OpenTherm bit capturing strategy I did a few days ago really did improve things quite a lot, as can be seen on a screendump of my OT Decoder:

"My" OpenTherm decoder

Below is what the Opentherm Monitor, a tool that belongs to the OpenTherm Gateway (yep, I’ve got one of those too, since a week or so 😉 ) is showing in the log:

Gateway Opentherm Monitor

This is great, wonderful result! And the time it took to come this far was well spent, cause I’ve learned a lot in the past days. I’m starting to know the Opentherm Data ID’s by heart, I know some more about the ATMega timers and Manchester decoding has no secrets for me anymore.

You may notice that in my case only the Master (lines with a ‘T’ in them, in both screendumps) frames are decoded and not the Slave response frames (Slave=Boiler, starting with a ‘B’ in the lower screendump), but that’s because I added a delay that causes the sketch to skip the Slave responses.

So the sketch is going to be the next thing I’m going to work on in the next couple of days. But for today, I’m going to relax and and stop thinking about microseconds, timers, prescalers and ISRs;  it’s time for something completely different: a Somfy RS-485 RTS Transmitter for our rolling shutters which will (hopefully) arrive in  a few weeks! That’s why I like Domotica so much – the diversity of things to do and learn.

 

Oops and yeah!

I don’t know what happened, but yesterday’s post is completely wrong. Maybe it was too late in the evening, maybe because I was trying to do 2 things simultaneously,  I don’t know – but the fact is, that yesterday’s post is totally messed up – it doesn’t reflect what I had in my mind at all… so that post was a major f***-up, if I may say so. Well, I’ll fix that some day. Soon, I hope. Maybe I can explain things better, once my thoughts are documented in code 😉

Fortunately, I didn’t use my own post for what I did today. It’s all in my head, right?  The status so far is that the JeeNode is correctly detecting the short and long periods and it’s reporting those on the Serial port (during the time where there’s no OT communication going on), in a format like this:

FT 1
0S
1S
0S
1L
0S
1S
0L
1L

The FT stands for ‘First Transition’, which tells me that a new transition from 0 to 1 has been spotted.  After that (line 2), the JeeNode input went LOW (0) for a short period (S), HIGH (1) for a short period, LOW for a short period and HIGH for a Long (L) period, …

With this information  I can ‘rebuild’ the signal and the bit stream. For now, I’ve  done this in my favorite programming language Delphi, just to speed things up a bit. The sketch has been running for just half an hour or so and the fact that the reconstructed bit streams have had a length of 68 bits all the time, is encouraging. Cause 68 bits lead to 34 Manchester-decoded bits, which is exactly the size of an OpenTherm frame of 32 bits +1 start- and stop bit. Yeah!!

When I’m 100% sure of how everything should work, I’ll embed all the bit calculations & manipulations  into the sketch, so that all I’ll only receive the resulting 4 bytes per OpenTherm frame from the JeeNode.

But there’s still a lot of work to do before I get there!

Time based sampling vs signal duration

Not being ‘hindered’ by any knowledge about Manchester decoding and signal processing, I came up with the following idea. Rereading the OpenTherm Protocol 2.2 documentation, the following is said about the bitrate and timing:

Bit rate                           : 1000 bits/sec
Period between mid-bit transitions : 900 .. 1150 μs (nominal 1 ms)

Furthermore, there’s a built-in time margin where a transition can take place; When TØ is the start of a bit period of ≈1000 μs, the transition must take place between TØ+400 μs and TØ+650 μs. If not, the transition wouldn’t be conform OT protocol specs. This in fact means that the time window in which the transition can take place is rather large, namely 250 (100+150) μs. That’s 25% of the total bit period, which sounds like a lot to me, actually.

I want to try to code something that’s just as flexible – maybe even more flexible, considering the fact that it’s not unusual for things to be out of spec. Wouldn’t it be better to search for other things that can be measured just as well but which are less dependent on time? Determining long vs. short periods should be enough, maybe? So lets forget about the timing and have a look at the highs and lows for a change. One thing that could be useful is the fact that the signal must always be stable for a period of at least 250 μs; cause if not, it would be out of OT protocol specs.

You could also say that a short period should be between 400 and 650 μs and a long period between 750 and 1250 μs. That means there’s a gap of 100 μs between the longest ‘short period’ and the shortest ‘long period’, but all still within specs. I should be able to determine whether a signal has been stable during a short period or a long one… right?

Update:

The rest of this post has been deleted, because it was totally rubbish and incorrect – what was I thinking?? Too much beer perhaps… 😉

This will soon be fixed…

Transition timing

Somehow the OpenTherm Monitor I built  a week ago has some problems to correctly decode the signal to Manchester code and produce the correct 32 bits. And the strange thing is, that my logic analyzer which I attached to the input pin on he JeeNode, doesn’t seem to have these problems (click):

BugLogic

The latest version of the Saleae Logic software has Manchester decoding included and the software can make sense of all this, so… Maybe the analyzer is more tolerant? I don’t know.

Time to figure out how ‘good’ the signal really is, and whether it’s worth the time to try to improve the OT Monitor – I don’t just want to give up!

The first thing I did, was having a look at the distribution of the periods between all the transitions (high <–> low). For that I wrote a sketch that used TIMER2 to check the value of the input pin every 80 µs. The time between each transition was measured and stored into an array during sampling. And after a certain number of transitions, it would write the array to the Serial port.

The Serial Output looked something like this:

2 0
3 1
4 1
5 217
6 2795
7 1246
8 23
9 0
10 0
11 16
12 325
13 281
14 17
15 0
16 1
17 0

Copy & paste to Excel and creating a chart revealed the following:

 

Here you see the distribution of all the measured periods between a transition (for both high to low and vice versa) .The numbers on the X-Axis should be multiplied with 80 µs to get the real duration of all the periods. Well, it’s clear that there’s a peak at ≈500 µs and a second one at ≈1000 µs (1 ms); that’s good and what can be expected for a signal with 1 ms bit rate and where the difference in duration between the short and long periods is a factor of 2. So far so good.

Another test I did was to see how the ‘low’ periods and ‘high’ periods related to each other; did they both perform equally well?

Don’t mind the X-axis numbers, I’m not that good with Excel 😉 – it’s the red (high periods) and blue (low periods) lines that count. Again, the time of the second peak (63)  is twice as larga as the first (31..32). That’s good.

But the chart is not that ‘clean’ anymore… some yet unexplainable things popped up. For example,  the leftmost peak of the high periods (e.g. the short high periods) shows a strange curve to the right. In words, this means that there are quite a lot of short high periods that take longer than you’d want – cause the ideal situation would be a chart with only 2 narrow, high peaks for both the highs and lows, right? Could this mess up things? It could very well be that 1 single transition messes up the timing and hence the decoding of a complete OT frame…

I don’t know if that last thought (1 single period messing up things)  is right  – I have to dig some more, understand the OT Monitor sketch, try to find out where in the frame the timing goes wrong and see if I can find a pattern somewhere. If there’s a pattern, maybe I can create a work-around for it to improve the decoding..

Ow, and why’s there a huge dip in the blue (low) chart?

To be continued…