geektimes

Understanding the POCSAG paging protocol

  • четверг, 7 февраля 2019 г. в 00:12:34
https://habr.com/en/post/438906/
  • Gadgets
  • Old hardware
  • Python
  • Systems engineering
  • Стандарты связи


Long time ago, when a mobile phone costed about 2000$ and one minute of voice call was 50 cents, pagers were really popular. Later cellular phones became cheaper, calls and SMS prices became lower, and finally pagers mostly disappeared.



For people, who owned a pager before, and want to know how it works, this article will be useful.

Main info


For people who forgot the principles or was born after 2000x, I'll remind the main ideas shortly.

The paging communications network has some advantages, that are sometimes important even now:

— Its a one-way communication, without any sort of confirmation, so the network cannot be overloaded — it just does not depends from a number of users. Messages are transmitting continuously «as is», one after another, and the pager is getting the message if its number (so called Capcode) is equal to the device internal number.

— The receiver is very lightweight (both literally and electronically), and can work up to a month from 2 AA batteries.

There are two basic standards of messages transmitting — POCSAG (Post Office Code Standardization Advisory Group) and FLEX. Both standards are pretty old, POCSAG was made at 1982, it can support 512, 1200 and 2400 bit/s speed. For transmitting FSK (frequency shift keying) method is used with a 4.5KHz frequency separation. FLEX is a bit newer (was made by Motorola in 90th), it can work with up to 6400 bit/s speed and can use both FSK2 and FSK4.

Both protocols are in general very easy, and about 20 years ago PC-decoders were made, that can decode messages from a serial port of sound card (there are no encryption supported, so all messages can be read by anyone).

Lets have a look, how it works.

Receiving a signal


First, we need a signal for decoding. Lets take a laptop, rtl-sdr receiver, and get it.



Frequency shift keying is used, so we'll set FM. With HDSDR we will save a signal in WAV format.

Lets check, what we get. Loading wav-file as a Python data array:

from scipy.io import wavfile
import matplotlib.pyplot as plt

fs, data = wavfile.read("pocsag.wav")
plt.plot(data)

plt.show()

Output (bits added manually):



As we can see, its easy, and even «with the naked eye» we can draw bits in Paint, its easy to distinguish where is «0» and where is «1». But it will be too long to do it manually, its time to automate the process.

After enlarging the graph, we can see that each bit has a 20 samples width. We have 24000 samples per second bitrate wav file, so the keying speed is 1200bit/s. Lets find a zero crossing position — its the begin of the bit sequence. Lets also add markers to verify that all bits are on proper places.

speed = 1200
fs = 24000
cnt = int(fs/speed)

start = 0
for p in range(2*cnt):
    if data[p] < - 50 and data[p+1] > 50:
        start = p
        break

# Bits frame
bits = np.zeros(data.size)
for p in range(0, data.size - cnt, cnt):
    bits[start + p] = 500

plt.plot(bits)

As we can see, its not perfectly match (transmitter and receiver have slight different frequencies), but its definitely enough for decoding.



For long signals we will probably need automatic frequency correction algorithm, but for this sort of signals its not critical.

The last step — we need to translate wav file to the bit sequence. Its also easy, we know the length of each bit, if the data sum is positive, we'll add «1», otherwise «0» (finally was found that a signal needs to be reverted, so 0 and 1 were replaced).

bits_str = ""
for p in range(0, data.size - cnt, cnt):
    s = 0
    for p1 in range(p, p+cnt):
        s += data[p]
    bits_str += "1" if s < 0 else "0"

print("Bits")
print(bits_str)


Output — proper bit sequence (in a string format), that contains our message.

101010101010101010101010101010101010101010101010101010101010101010101010101
010101010101010101010101010101010101010101010100111110011010010000101001101
100001111010100010011100000110010111011110101000100111000001100101110111101
010001001110000011001011101111010100010011100000110010111011110101000100111
000001100101110111101010001001110000011001011101111010100010011100000110010
011011110101000100111000001100101110111101010001001110000011001011101111010
100010011100000110010111011110101000100111000001100101110111101010001001110
...
111101111


Decoding numeric-only messages


A bit sequence is much more convenient than a wav-file, we can extract data from it. First, lets split the data into 4 byte blocks.

10101010101010101010101010101010
10101010101010101010101010101010
10101010101010101010101010101010
10101010101010101010101010101010

01111100110100100001010011011000
01111010100010011100000110010111
01111010100010011100000110010111
01111010100010011100000110010111
01111010100010011100000110010111
00001000011011110100010001101000
10000011010000010101010011010100

01111100110100100001010111011000
11110101010001000001000000111000
01111010100010011100000110010111
01111010100010011100000110010111
01111010100010011100000110010111
00100101101001011010010100101111


We definitely can see a pattern. Now we need to find, what each part means. The POCSAG manual is available in PDF format, lets check the data structures description.



Its now much more clear. The header contains a long block «10101010101», it's used to «awake» the pager from a sleep mode. The message itself contains the blocks Batch-1… Batch-N, each block is starting from the unique sequence FSC. Then, as we can see in the manual, if the string begins from «0», it contains the recipient address. The address itself (capcode) is stored is the pager, and if it does not match, the pager will ignore the message. If a string starts from «1», it contains the message body. In our example we have 2 strings of this type.

Lets not check each block. We can also see Idle codes — empty blocks 01111...0111, they does not have any useful information. After removing them, we get only this:

01111100110100100001010011011000 - Frame Sync
00001000011011110100010001101000 - Address
10000011010000010101010011010100 - Message

01111100110100100001010111011000 - Frame Sync
11110101010001000001000000111000 - Message
00100101101001011010010100101111 - Address


We need to find, whats inside.

After checking the manual its clear that there are two types of messages — numeric-only and alpha-numeric. Numeric-only messages are saved as 4bit BCD-codes, so 20 bits can contain 5 symbols (there are also CRC bits, we're not using them for now). If the message is alpha-numeric, 7-bit ASCII encoding is used. This message is too short, so it can be only a numeric-only message.

From strings 10000011010000010101010011010100 and 11110101010001000001000000111000 we can get this 4-bits sequences:
1 0000 0110 1000 0010 10101 0011010100 — 0h 6h 8h 2h Ah
1 1110 1010 1000 1000 00100 0000111000 — Eh Ah 8h 8h 2h

Next step, is to get the decoding table from the manual:



Its obvious that a numeric-only message can contain digits 0-9, letter U («ugrent»), space and two parentheses. Lets write a small method to decode it:

def parse_msg(block):
    # 16 lines in a batch, each block has a length 32 bits
    for cw in range(16):  
        cws = block[32 * cw:32 * (cw + 1)]
        if cws[0] == "0":
            addr = cws[1:19]
            print("  Addr:" + addr)
        else:
            msg = cws[1:21]
            print("  Msg: " + msg)
            size = 4
            s = ""
            for ind in range(0, len(msg), size):
                bcd_s = msg[ind:ind + size]
                value = int(bcd_s, 2)
                symbols = "0123456789*U -)("
                s += symbols[value]
            print("    ", s)
    print()

Finally, we get a message «0682*)*882».

Its hard to tell what it means, but if the numeric-only messages are used, somebody probably needs it.

Decoding alpha-numeric messages


Next, and more interesting step, is to decode alpha-numeric messages. Its more interesting, because as the output, we should get the human readable text.

First, we need to record a message again, we'll use HDSDR. We don't know a message type before decoding, so we will just record a longest message, we can get, and will hope that it contains some text.



After converting from wav to a bit sequence (see a Python code above), we are getting this:



Some interesting stuff we can see immediately, with a naked eye — as example, the start sequence 01010101010101 is repeating twice. Thus, this message is not only longer, it literally contains two messages, merged together (a standard is not denying this, btw).

As we have found before, each data block is starting from a sequence, called Frame Sync Code (01111100...), after it 32-bit blocks are sending. Each block can store either the address or message body.

Earlier we got the numeric-only messages, now we want to read ASCII-messages. First, we need to distinguish them. This data is saved in a «Function Bits» field (bits 20-21) — if both bits are 00, its a numeric-only message, if bits are 11, its a text message.

Interesting to mention, that message field is 20-bits long, so its ideal to put five 4-bit blocks there in the case of a numeric-only message. But if we have 7bit ASCII-message, we cannot divide 20 to 7. Its possible to predict that the first protocol version was supporting only numeric-only messages (don't forget that it was made in 1982 and probably first nixie-tube pagers were not able to display more), and only later ASCII-messages support was added. Because of the legacy reasons the framing standard was not changed, and the developers used the easy approach — they just combined bits «as is», one after another. From each message we need to take 20 bits and merge it to the next, finally we can decode the message body.

Lets see one block of our message (spaces were adder to read easier):

0 0001010011100010111111110010010
1 00010100000110110011 11100111001
1 01011010011001110100 01111011100
1 11010001110110100100 11011000100
1 11000001101000110100 10011110111
1 11100000010100011011 11101110000
1 00110010111011001101 10011011010
1 00011001011100010110 10011000010
1 10101100000010010101 10110000101
1 00010110111011001101 00000011011
1 10100101000000101000 11001010100
1 00111101010101101100 11011111010


«0» bit In the first string shows us that it is the address field, and «11» in 20-21 bits shows us that the message is really alpha-numeric. Then we just take 20 bit from every string and merge it together.

This is our bit sequence:

00010100000110110011010110100110011101001101000111011010010011000001101000
11010011100000010100011011001100101110110011010001100101110001011010101100
000010010101000101101110110011011010010100000010100000111101010101101


In POCSAG 7-bit ASCII code is used, so we will split a string in to 7 char blocks:

0001010 0000110 1100110 1011010 0110011 1010011 ...

After trying to decode it (ASCII table can be easily found in the Internet), we getting… just nothing. Checking the manual again, and here is the small phrase «ASCII characters are placed from left to right (MSB to LSB). The LSB is transmitting first.». So, the low bit is transmitting first — for correct decoding we need to reverse all strings.

Its too boring to do it manually, so let's write a Python code:

def parse_msg(block):
    msgs = ""
    for cw in range(16):
        cws = block[32 * cw:32 * (cw + 1)]
        # Skip the idle word
        if cws.startswith("0111101010"):
            continue

        if cws[0] == "0":
            addr, type = cws[1:19], cws[19:21]
            print("  Addr:" + addr, type)
        else:
            msg = cws[1:21]
            print("  Msg: " + msg)
            msgs += msg

    # Split long string to 7 chars blocks
    bits = [msgs[i:i+7] for i in range(0, len(msgs), 7)]

    # Get the message
    msg = ""
    for b in bits:
        b1 = b[::-1]  # Revert string
        value = int(b1, 2)
        msg += chr(value)

    print("Msg:", msg)
    print()

Finally, we are getting this sequence (bits, symbol codes, and ASCII symbols):

0101000 40 (
0110000 48 0
0110011 51 3
0101101 45 -
1100110 102 f
1100101 101 e
1100010 98 b
0101101 45 -
0110010 50 2
0110000 48 0
0110001 49 1
0111001 57 9
0100000 32
0110001 49 1
0110011 51 3
0111010 58 :
0110011 51 3
0110001 49 1
0111010 58 :
0110100 52 4
0110101 53 5
0100000 32
0101010 42 *
0110100 52 4
0110111 55 7
0110110 54 6
0101001 41 )
0100000 32
1000001 65 A
1010111 87 W
1011010 90 Z

After merging we're getting the string: "(03-feb-2019 13:31:45 *476) AWZ". As was promised, its pretty human readable.

By the way, its interesting to mention, that 7-bit ASCII codes are used. Symbols of some alphabets (German, Cyrillic, etc) can not be properly encoded in 7 bits. Why 7 bits? Probably the engineers had decided that «7 bits will be enough for all», who knows…

Conclusion


It was really interesting to investigate, how POCSAG works. It is one of the rare protocols, that are in use until now, that can literally be decoded on the sheet of paper (and I definitely will not try this with TETRA or GSM).

For sure, the POCSAG protocol is not fully described here. The most important and interesting part is done, other stuff is not so exciting. At least, there is no capcodes decoding, and there is no error correction code (BCH Check Bits) — it can allow to fix up to 2 wrong bits in the message. But there was no goal to write another POCSAG decoder here, there are already enough of them.

For those, who want to test real decoding with rtl-sdr, freeware PDW application can be used. It does not require installation, its only enough to forward sound from HDSDR to PDW via Virtual Audio Cable application.

The results looks like this:


(please keep in mind that public service messages decoding can be illegal in some countries, and anyway respect the privacy of the recipients)

If somebody wants to get more information about this topic, sources of the multimon-ng decoder are available, it can decode many protocols, also POCSAG and FLEX.

Thanks for reading.