Sniffing the ISM band at 912MHz

There's a band of ISM (Industry, Science, and Medicine) spectrum at 915ish MHz. ISM spectrum is a place where unlicensed devices can operate under FCC part 15 restrictions (or part 18 if they're licenced). This is where all those "900MHz" phones and whatnot live.

There's potentially a lot of interesting stuff lurking in this band: running a simple flowgraph to watch the waterfall of it shows a bunch of tiny packets flying around. Unfortunately, it's really hard to figure out where/when they are, so you're left dumping the spectrum to a file, and then post-processing it. Doing that in GRC is a bit ugly, so I decided to pull the data into IPython and see what I could do over here.

Part of this is that we don't know what we're looking for, or where. To make it easier to work with, we'll capture a buuuunch of data, then find the packets within it iteratively. This is very memory-conserving: doing the whole thing at once runs off into Bad Places in terms of performance. Note that GNU Radio's native components don't have this issue, which is the main benefit of its architecture. However, I find writing a GNU Radio block to test my code a bit tedious/distracting from the actual goal. This file is a great clean-up of the five(!) different notebooks I made in the process of figuring out reasonable ways of solving these problems; mixing in blocks probably would have just made it even more scattershot in the first versions.

(NB: the code below takes in a complex short integer input. If you're capturing in the usual complex format, you'll need to adjust the fetch_x() function to handle the appropriate input format. It'll probably be just a symbol size of 2*4 bytes, then fromfile("foo.sdr", dtype="complex64"), skipping the subsequent hoops I have to jump through to go from two signed ints to a reasonably-sized complex value.)

In [430]:
%pylab inline

pylab.rcParams["figure.figsize"] = 14,4

import scipy
import scipy.signal

import gc

import os
import json
Populating the interactive namespace from numpy and matplotlib

Data access helper function

To work from disk, we're going to do a bit of our own crappy buffering. This function is how you request a span of samples from the input file. It'll return them as a complex np.array.

In [431]:
# We're going to need a wackypants way to get to the right places here

FILENAME="912_snoop_si16.sdr"
SAMPLE_SIZE = 2*2 # make this 2*4 if you're using complex inputs

Fs = 4e6   # Sample rate of our input
Fc = 912e6 # Carrier frequency of the input; not really used, but good to know.



def fetch_x(sample_offset, sample_count):
    f = file(FILENAME)
    f.seek(SAMPLE_SIZE*sample_offset)
    b = f.read(SAMPLE_SIZE*sample_count)
    f.close()

    # If you're using complex input, just do 
    # return np.frombuffer(b, dtype="complex64")
    x = np.frombuffer(b, dtype="int16")
    x = x.reshape(len(x)/2,2)

    xc = (x[:,0] + 1.0j*x[:,1]).astype("complex64")
    del(x)
    gc.collect()
    return xc

s = os.stat(FILENAME)
file_length_samples = s.st_size / SAMPLE_SIZE
file_length_samples
Out[431]:
120777916

Data structures

Before we get to the process of finding the actual packets, we need to create the structures that will contain the information about those packets.

The core unit here is the "Hit", which is just a frequency+bandwidth with sample_start+sample_length. That is: it's a bounding box on the waterfall plot of the input file. We can then look for places where these butt into each other, and merge them together. This is generally only going to happen when we have a partial hit for a given signal. We'll get those partials any time a packet happens to cross one of our sample buffer blocks.

There's also a HitList class, which is just an array of Hits that knows what to do when merging two Hits together, and has an algorithm to find merge candidates. This could be improved a bunch by moving to something like a quadtree for the initial pass, and stuffing them in as we go along, but, that's for later.

In [432]:
class Hit:
    """Container for a single packet "hit" in the data stream
    
    We use these to keep notes on where in the RF stream a packet might be found.
    
    They key variables here are:
    * Fc -- center frequency
    * bw -- bandwidth
    * Fs -- sample rate of the original data stream
    * sample_start -- offset in the original system's sample stream where it's found
    * sample_end -- length of the sample in the origin's sample rate
    """
    def __init__(self, Fs, Fc, bw, sample_start, sample_length, bw_fudge=None, sample_start_fudge=2000, sample_length_fudge=8000):
        """Create a new Hit entry
         * Fs -- sample rate of original system
         * Fc -- center frequency of packet
         * bw -- bandwidth of packet
         * sample_start -- initial sample position of packet (in Fs)
         * sample_length -- length of packet (in Fs)
         * bw_fudge -- amount to grow the bandwidth for purposes of capture and matching (default: +2x top and bottom, so 5x)
         * sample_start_fudge -- amount to grow the sample span at its head
         * sample_length_fudge -- amount to grow the sample span at its tail
        """
        self.Fs = Fs
        self.Fc = Fc
        self.bw = bw
        
        self.bw_fudge = bw_fudge
        
        self.sample_start_fudge = sample_start_fudge
        self.sample_length_fudge = sample_length_fudge
        
        if None == bw_fudge:
            #self.bw += 4*bw
            self.bw_fudge = 2*bw
        else:
            #self.bw += bw_fudge
            pass
        
        self.sample_start = sample_start # - sample_start_fudge
        self.sample_length = sample_length # + sample_length_fudge
  
    def f_low(self, fudge=True):
        "Low end of the frequency span for this Hit, including fudge factors"
        fudge = self.bw_fudge if fudge else 0
        return self.Fc - self.bw/2 - fudge
    
    def f_high(self, fudge=True):
        "High end of the frequency span for this Hit, including fudge factors"
        fudge = self.bw_fudge if fudge else 0
        return self.Fc + self.bw/2 + fudge
    
    def in_band(self, f):
        "Query if the given frequency is within the frequency span for this Hit"
        return self.between(self.f_low(), self.f_high(), f)
    
    def s_low(self, fudge=True):
        "Sample offset of the start of the packet in the origin stream (measured in source samples)"
        fudge = self.sample_start_fudge if fudge else 0
        return int(self.sample_start - fudge)
    
    def s_high(self, fudge=True):
        "Sample offset of the end of the packet in the origin stream (measured in source samples)"
        fudge = self.sample_length_fudge if fudge else 0
        return int(self.sample_start+self.sample_length + fudge)
    
    def s_length(self, fudge=True):
        return self.s_high(fudge) - self.s_low(fudge)
    
    def in_time(self, s):
        "Query if the given sample number is within the frequency span for this Hit"
        return self.between(self.s_low(), self.s_high(), s)
    
    def between(self, a, b, q):
        "Helper method: check if q falls between a and b. Will reorder a/b internally if needed."
        if b < a:
            t = a
            a = b
            b = t
            
        return ( (a <= q) and (q <= b) )
    
    def could_merge(self, h2):
        "Given another Hit h2, see if they overlap in both samples and frequency."
        f0_in = self.in_band(h2.f_low())
        f1_in = self.in_band(h2.f_high())
        
        s0_in = self.in_time(h2.s_low())
        s1_in = self.in_time(h2.s_high())
        
        if (f0_in or f1_in) and (s0_in or s1_in):
            return True
        
        return False
    
    def merge(self, h2):
        "Given another Hit h2, find a new Hit that subsumes both in time and freqency."
        f_low = min(self.f_low(False), h2.f_low(False))
        f_high = max(self.f_high(False), h2.f_high(False))
        
        s_low = min(self.s_low(False), h2.s_low(False))
        s_high = max(self.s_high(False), h2.s_high(False))
    
        f_bw = f_high - f_low
        s_length = s_high - s_low
        
        return Hit(self.Fs, (f_low+f_high)/2.0, f_bw, s_low, s_high-s_low, self.bw_fudge, self.sample_start_fudge, self.sample_length_fudge)
    
    def __str__(self):
        "Helper method."
        return "%f / %f, %d + %d" % (self.Fc, self.bw, self.sample_start, self.sample_length)
    
    
class HitList:
    """Container for a bunch of Hit entries, and coordinator of merging passes
    
    This is where all the Hits go, and coordinates the merge process for gluing Hits together into larger packets.
    
    """
    def __init__(self):
        self.hits = []        
        self.merged = None # Initialize to None as a sentry for run_merge's bootstrap
        
    def mark(self, Fs, Fc, bw, sample_start, sample_length, **kwargs):
        h = Hit(Fs, Fc, bw, sample_start, sample_length, **kwargs)
      
        self.hits.append(h)
    
    def _do_merge(self, hits):
        "Helper method: runs a merge pass on the input array of Hits, returns new array of Hits post-merge"
        merged = []
        
        for i,h_i in enumerate(hits):
            did_merge = False
                        
            for m,h_m in enumerate(merged):
                if h_i.could_merge(h_m) or h_m.could_merge(h_i):
                    merged[m] = h_m.merge(h_i)
                    did_merge = True
                    break
                    
            if not did_merge:
                merged.append(h_i)
        return merged
    
    def run_merge(self):
        "Run a merge pass over the current merged hits, further merging.  Uses input if no merge has been run."
        if None == self.merged:
            l0 = len(self.hits)
            self.merged = self._do_merge(self.hits)
        else:
            l0 = len(self.merged)
            self.merged = self._do_merge(self.merged)
        return (l0, len(self.merged))
    
hl = HitList()

Find packets

This is where the first bit of magic happens: we find all the packets in the original file. This is done by buffering in blocks of data, identifying statistical outliers, and then running a state machine to extract adjacent outliers.

Once we have that list, we can start looking at each packet in detail.

Buffering

To find the packets without loading the whole file into memory at the same time, we work through it in large-ish chunks, finding the Hits within each chunk. Then, after we've gone through the whole file, we run the merge algorithm over the HitList to get down to the minimal set of packets in the entire file.

To ensure we don't miss anything, we take buffers which overlap each other by a bit. This results in redundant Hits in a lot of cases, but that's fine: the merge algorithm will just take care of them.

Finding packet candidates with statistics

To find candidate packets, we basically apply a simplistic AGC+squelch to each buffer of data. We first run specgram to get the spectrogram of the data. You can think of a spectrogram as a waterfall display turned into a two dimensional array of data. It's basically (frequency bin, time bin) => energy.

We sample a bunch of points of that data to find out what the distribution of energy for each (frequency,time) bin is. We build an Empirical Cumulative Density Function (ECDF) of the data from this sample, so we have a rough idea where the Nth percentile falls. We then go through the data and threshold it as being 98th percentile or above, or below 98th percentile. This gives us another 2-D array of (frequency,time) -> {above,below}

The 98th percentile here is entirely arbitrary, and may not be appropriate if you're trying to monitor a band where more than 2% of the frequencies are in use. You can try twiddling it around for our local conditions. Similarly, the sample size used to determine the 98th percentile is arbitrary, though you want it to be reasonably large. 1000 has served me well enough, but I've also used 10k in this project, without really noticing any differences.

Also, the width of the frequency bins may matter. You want to have a rough idea what the minimum target bandwidth you're after is, as a very strong narrow signal will get washed out if your bins are much larger than it.

Assembling possible hits

After we've identified individual bins that may contain packets, we walk across this array of (frequency,time) bins. We move across it left-to-right, moving forward in time with each step. When we see a frequency row go from below-threshold to above-threshold, we note the time when that row went high. Then, when we later on see that same row go from above-threshold to below-threshold, we can figure out how long that frequency bin was above the threshold. If it's above some minimum length, we feel very confident that this is a strong signal that stands out from the background noise, and isn't just random noise. I'm using a relatively long run length here, which gets rid of marginal signals that float around at the 98th percentile.

In [433]:
target_bw = 50000
max_bw = 200000
N_sample = 1000
p_cutoff = 0.95

nFFTs = int(2**(np.ceil(np.log2(Fs/target_bw*2)))) # We want buckets of at most target_bw/2 Hz

bin_bw = Fs/nFFTs

minrun = 10

samples_per_block = nFFTs*1000
samples_per_step = samples_per_block - 3 * minrun*nFFTs


def run_step(block_no, hl):
    start0 = block_no*samples_per_step
    xc = fetch_x(start0, samples_per_block)
    
    a = specgram(xc, NFFT=nFFTs, Fs=Fs, noverlap=0, scale_by_freq=False); None
    clf()
    # Build a quick ECDF of this block, as a janky sort of AGC
    pt_sample = a[0].flatten()[ np.random.uniform(low=0, high=a[0].size, size=N_sample).astype("int") ]
    pt_sample.sort()
    
    # Find the p_cutoff'th percentile and use that as our threshold
    th = pt_sample[int(p_cutoff*N_sample)]
 
    # A quick transition-based state machine to find occupied channels
    states = np.zeros(nFFTs).astype("bool")
    starts = np.zeros(nFFTs)

    retval = []

    for i in range(a[0].shape[1]):
        r = (a[0][:,i] > th).astype("bool")
        starts[r & (~states)] = i
        term_is = (states & ~r).nonzero()[0]

        for j in term_is:
            if minrun <= i - starts[j]:
                hl.mark(Fs, 0.0+a[1][j]+bin_bw/2, bin_bw, start0 + nFFTs*starts[j], nFFTs*(i-starts[j]), bw_fudge=max_bw/2)
            starts[j] = 0

        states = r
        
    i = a[0].shape[1]
    term_is = states.nonzero()[0]
    for j in term_is:
        if minrun <= i - starts[j]:
            hl.mark(Fs, 0.0+a[1][j]+bin_bw/2, bin_bw, start0 + nFFTs*starts[j], nFFTs*(i-starts[j]), bw_fudge=max_bw/2)
        starts[j] = 0   

    gc.collect()

    return a[0].shape
    
In [434]:
# We run with this weird inside-out pattern to minimize
# possible memory leaks from the processing above.  Without
# attention to detail here, you can wind up leaking out
# all the RAM in your system.

for block_no in range(file_length_samples/samples_per_step):
    run_step(block_no, hl)
    gc.collect()
    if 0 == block_no % 25:
        print len(hl.hits), file_length_samples, block_no*samples_per_step
    
    
len(hl.hits)
8 120777916 0
517 120777916 6208000
790 120777916 12416000
1121 120777916 18624000
1594 120777916 24832000
1935 120777916 31040000
2371 120777916 37248000
2671 120777916 43456000
2988 120777916 49664000
3442 120777916 55872000
3749 120777916 62080000
4151 120777916 68288000
4466 120777916 74496000
4754 120777916 80704000
5097 120777916 86912000
5323 120777916 93120000
5584 120777916 99328000
5974 120777916 105536000
6261 120777916 111744000
6650 120777916 117952000

Out[434]:
6870
<matplotlib.figure.Figure at 0x7f12c2ec0610>

Merge

Now we've got a bunch of possible packet locations. Unfortunately, we know they're likely to contain redundant Hits, both for packets that straddle the boundaries, as well as for packets whose carrier happened to fall on bin frequency boundaries, and therefore bounced around between bins.

We now run the merge operation, which boils down to "Make each Hit a little bigger in both time and frequency, then see if it overlaps with other packets that are also made a little bigger. When two overlap, make a new Hit that covers both of them and delete the two original Hits. Keep running this process until the number of Hits doesn't change when you run." In theory, you shouldn't see this number change after the first pass, but it's idempotent, so we run it multiple times to make sure we cover any bugs/thinkos in the implementation.

There are so many better ways to do this, but this way is super easy to explain and very simple to code. A much better way to do this is to build a quadtree structure. If you refactor this to do that, could you also make it into a proper gnuradio block while you're at it? Pull requests welcome.

The run_merge() method returns the length before running and the length afterwards, so we can monitor its progress from the outside.

In [435]:
hl.run_merge() # This will take a while; it's doing a lot of heavy lifting
Out[435]:
(6870, 1034)
In [436]:
l0,l1 = hl.run_merge() # Hopefully this is the last merge step, but who knows.
(l0,l1)
Out[436]:
(1034, 942)
In [437]:
# And now we just let it run for a bit.
while l0 != l1:
    l0,l1 = hl.run_merge()
    print (l0,l1)
(l0,l1)
(942, 933)
(933, 933)

Out[437]:
(933, 933)

Packets identified

We should now have some packets identified. Time to see what they look like!

In [438]:
# [ (h.Fc, h.s_low(), h.s_high()) for h in hl.merged[0:20] ]
In [439]:
# [ (h.Fc, h.s_low(), h.s_high()) for h in hl.hits[0:10] ]

Let's pull out the samples for the first packet and see what it looks like.

In [440]:
h = hl.merged[0]

y = fetch_x(h.s_low(), h.s_length())
specgram(y, NFFT=64, noverlap=63, Fs=Fs); None
title(str(h))
Out[440]:
<matplotlib.text.Text at 0x7f12c29b4e10>

Okay, this looks reasonable, but we need to focus in on it. To do that, we're going to use channel extraction to pull out just that bit of bandwidth.

Channel Extraction

Let's add in our channel extractor from previous work. This lets us get to smaller bands around the channel of interest. This just pulls out the band around a given center frequency, and samples it down to a lower bandwidth to make it easier to work with.

In [484]:
def extract_channel(f_sample, f_center, f_bw, x):
    """Extract a channel of width f_bw at f_center, from f_sample
    
    Returns Fs,y, the new sample rate and an array of complex samples
    """
    
    # Cribbing from the GNURadio xlate, we use their block diagram
    
    my_state = {} # cheating backchannel for debug
    
    if   (f_center + f_bw/2.0) > f_sample / 2.0:
        f_bw = f_sample/2.0 - f_center - 1000
    elif (f_center - f_bw/2.0) < -f_sample/2.0:
        f_bw = f_center-f_sample/2 - 100
    
    # Create a LPF for our target bandwidth
    n_taps = 100 # XXX Total random shot in the dark    
    r_cutoff = float(f_bw)/f_sample    
    base_lpf = scipy.signal.remez(n_taps, [0, r_cutoff, r_cutoff*2.0, 0.5], [1,0])
    
    # Shift this LPF up to our center frequency
    fwT0 = 2.0*np.pi*f_center/f_sample
    
    lpf = base_lpf * np.exp(1.0j*fwT0 * np.arange(0, n_taps))
    
    dec_rate = int(f_sample / (2*f_bw))
    
    x_filtered = scipy.signal.lfilter(lpf, 1.0, x)
    
    dec_is = np.arange(0, len(x_filtered), dec_rate)
    y = x_filtered[dec_is]
    y *= np.exp(-1.0j*fwT0*dec_is)
    
    my_state["n_taps"] = n_taps
    my_state["r_cutoff"] = r_cutoff
    my_state["lpf"] = lpf
    my_state["base_lpf"] = base_lpf
    my_state["fwT0"] = fwT0
    my_state["dec_rate"] = dec_rate
    # my_state["dec_is"] = dec_is
    # my_state["x_filtered"] = x_filtered
    
    return f_sample/dec_rate, y, my_state

Now, let's see what that zero'th packet looks like when zoomed in

In [442]:
def plot_h(h):
    y = fetch_x(h.s_low(), h.s_length())
    Fs_yp, yp, _ = extract_channel(Fs, h.Fc, h.bw, y)
    a = specgram(yp, Fs=Fs_yp, NFFT=32, noverlap=31)
    title("%d: %s" % (i,str(h)))
    return { "Fs_yp": Fs_yp, "yp": yp, "a": a }

i = 0
h = hl.merged[i]

plot_h(h)
None

Decoding

Now, let's decode the packets. The goal here is to get to a bitstream that's appropriate for the signal of interest for each packet, which is tricky.

Frequency lock and time trimming

To do this, we need to put the center of our packet at baseband (zero Hz), and cut out the noise at the head and tail of the packet. This can be done similarly to the packet extraction itself, but a bit simpler.

First, we create another ECDF, but this time for the entire downsampled packet buffer. From there, we apply a threshold of 98% (98% is arbitrary, like it was last time), and then find the average frequency that's above the threshold. We also find the length of the head and tail times where no frequency is above the threshold. Note that these sample counts are going to be in the sample rate of this packet's buffer, not the overall system rate (ie they're in units of Fs_yp, not Fs).

Once we have these three numbers, we can really focus in on our packet.

In [443]:
def plot_h(h):
    y = fetch_x(h.s_low(), h.s_length())
    Fs_yp, yp, _ = extract_channel(Fs, h.Fc, h.bw, y)
    a = specgram(yp, Fs=Fs_yp, NFFT=32, noverlap=31)
    clf()
    
    p_threshold = 0.98
    
    energies = a[0].flatten()
    energies.sort()
    
    threshold = energies[int(p_threshold*len(energies))]
    
    in_use = a[0] > threshold
    
    # Weighted mean
    mean_f = np.sum(np.sum(in_use, axis=1) * a[1])/np.sum(in_use)
    
    times_in_use = np.sum(in_use, axis=0) > 0
    head_skip = 0
    for t in times_in_use:
        if t:
            break
        head_skip += 1
        
    
    tail_skip = 0
    for t in times_in_use[::-1]:
        if t:
            break
        tail_skip += 1
    
    
    head_skip_Fs = int(head_skip * Fs / Fs_yp)
    tail_skip_Fs = int(tail_skip * Fs / Fs_yp)

    y = fetch_x(h.s_low()+head_skip_Fs, h.s_length()-tail_skip_Fs)
    Fs_yp, yp, _ = extract_channel(Fs, h.Fc+mean_f, h.bw, y)
    a = specgram(yp, Fs=Fs_yp, NFFT=32, noverlap=31)
    
    title("%d: %s" % (i,str(h)))
    return yp

i = 0
h = hl.merged[i]

yp = plot_h(h)
None

That looks prety good. Now we can get on to the business of decoding. There looks to be a lot of FSK, and there's probably a bunch of PSK too, so our decode_quad() will work again. See the earlier work on FM decoding for how this works.

Quadrature decoding

In [444]:
def decode_quad(x, gain):
    xp = x[1:] * np.conj(x[:-1])
    retval = gain * np.arctan2(np.imag(xp), np.real(xp))
    return retval
In [445]:
b = decode_quad(yp, 10.0)
plot(b)
Out[445]:
[<matplotlib.lines.Line2D at 0x7f12c2d94250>]

That looks a little jagged, so let's put in a LPF.

In [446]:
lpf = scipy.signal.remez(20, [0, 0.2, 0.3, 0.5], [1,0])
b = scipy.signal.lfilter(lpf, 1.0, decode_quad(yp, 10.0))
plot(b)
Out[446]:
[<matplotlib.lines.Line2D at 0x7f12c27e4ad0>]

Extract binary states

Now to extract the bitstream. Since we've got the frequency locked, this is pretty easy.

In [447]:
lpf = scipy.signal.remez(20, [0, 0.2, 0.3, 0.5], [1,0])
b = scipy.signal.lfilter(lpf, 1.0, decode_quad(yp, 10.0))
bitstream = b > 0
plot(bitstream)
ylim(-0.1,1.1)
Out[447]:
(-0.1, 1.1)
In [448]:
lpf = scipy.signal.remez(20, [0, 0.2, 0.3, 0.5], [1,0])
b = scipy.signal.lfilter(lpf, 1.0, decode_quad(yp, 10.0))
bitstream = b > 0
ylim(-0.1,1.1)

# Highlighting the graph below
spans = [ (0, "grey"), (175, "blue"), (750, "red"), (1940, "grey"), (2000, None) ]
for s0,s1 in zip(spans[:-1],spans[1:]):
    axvspan(s0[0], s1[0], color=s0[1], alpha=0.4, lw=0)

plot(bitstream, color="blue", lw=2)

xlim(spans[0][0], spans[-1][0])
Out[448]:
(0, 2000)

In the diagram above, there are four segments, each highlighted. The first and last segments are noise: they're just crap from when there is no actual signal present. The second section, in blue, is the preamble. This is a series of 0101010101... values sent to signal the other side that data is about to come. The third section, in red, is the actual data.

"Clock recovery" (or an approximation thereof)

As a human, you can probaly see the 1s and 0s in the signal above. But, if you look more closely, you'll quickly see that every "1" above is actually a whole bunch of 1s, as is every "0". We've sampled the data a lot more quickly than this signal changes.

In [449]:
plot(bitstream, '-o')
ylim(-0.1, 1.1)
xlim(400, 500)
Out[449]:
(400, 500)

So, we need to figure out how to count the dots in each of the spans above. Note that we can't just do a trivial counting here, either: the signal might have a clock rate that's not an integer divisor of our sampling rate. If we just did the trivial thing there, we'd drift out over time, which is a bummer.

Run-length encoding (RLE)

First step: we need to get a run-length encoding (RLE) of the bit states. If there's a preamble on this packet, we're going to see a bunch of very short runs, each of which is either a single symbol (or half of a symbol if we're Manchester encoded, but I digress).

Note that we don't care whether the signal is high or low, we just want to know how long it stays in each state. So, we just need to walk along, keeping track of how long we've been in the current state, and pushing that off when we change states. However, I'm going to do it a bit differently below, partly because it's cute, and partly because it's about a bajillion times faster to do it this way (python is awful at lists).

In [450]:
def bitlen_rle(b):
    "Returns a RLE of the central binary state lengths, with no concept of parity. Ignores ends."
    edges = (b[:-1] != b[1:]).nonzero()[0]
    # XXX This neglects the first and last segments. We don't really care, but.
    run_lengths = np.diff(edges)
    return run_lengths

With that in hand, let's look at the distribution of lengths in the data above.

In [451]:
b_rle = bitlen_rle(bitstream)
hist(b_rle, bins=range(200)); None

Note the giant spike at 17ish. That's darned close to our symbol rate. Note also the smaller giant spike at 16ish. That's there because our symbol rate isn't 1/17th of our sampling rate. We have a non-integer divisor to deal with here.

Let's also make it a bit easier on ourselves: we expect the preamble to be within the first half of the packet, so let's only look at the first half of the bitstream. Let's also throw out lengths of 1 or 2: changes of length 1 are pure noise, and a divisor of two is dangerously close to the nyquist frequency for this kind of random finding work. Also, our frequency selection mechanism above should actually make it rare to hit the nyquist frequency.

Combining those two simplifications, we can see the peak jump out at us:

In [465]:
b_rle = bitlen_rle(bitstream[0:len(bitstream)/2])
hist(b_rle[b_rle > 2], bins=range(200)); None

Now, we just need to automate extracting that peak and then figuring out the average symbol length. Once we have the peak, we'll just take one bin on either side of it and average the values all together. This should get us a reasonable estimate.

In [464]:
max_bit_length = min(250, len(bitstream)/10) # let's be reasonable here...
bitlen_counts = np.zeros(max_bit_length)
b_rle = bitlen_rle(bitstream[0:len(bitstream)/2])

for l in b_rle:
    if l > 2 and l < max_bit_length:
        bitlen_counts[l] += 1
    
peak_bitlen = np.argmax(bitlen_counts)
pb_is = np.array([peak_bitlen-1, peak_bitlen, peak_bitlen+1])
# Another weighted average, what a beautiful day!
plausible_bitlen = np.sum(pb_is * bitlen_counts[pb_is])/np.sum(bitlen_counts[pb_is])

plausible_bitlen
Out[464]:
3.0

So, that's our most likely bit length. Out of idle curiosity, what bitrate is that?

In [454]:
Fs_yp/plausible_bitlen
Out[454]:
9594.813614262559

Well, then. 9600 bits per second. We got lucky to find someone into the classics.

Now to pull out the actual bits. First, let's do the not-quite-trivial thing, and select the bits based on rounding the current sample index.

In [463]:
def janky_decimate(x, decim):
    return x[np.round(np.arange(0,len(x)-1,decim)).astype("int")]

jbits = janky_decimate(bitstream, plausible_bitlen)
jb_rle = bitlen_rle(jbits)
b_rle = bitlen_rle(bitstream)

plot(b_rle[36:]) # sorry about this kludge here...
plot(jb_rle*plausible_bitlen, 'o')
Out[463]:
[<matplotlib.lines.Line2D at 0x7f12c26a0f50>]

Oh hey, that seems to have worked! It might not work all the time, but it's good enough for our current "capture all the packets and let the compute sort them out" technique. Now to bundle it up into a useful helper routine.

In [456]:
def extract_base_bitstream(bitstream, Fs):
    max_bit_length = min(250, len(bitstream)/10) # let's be reasonable here...
    bitlen_counts = np.zeros(max_bit_length)
    b_rle = bitlen_rle(bitstream[0:len(bitstream)/2])

    for l in b_rle:
        if l > 2 and l < max_bit_length:
            bitlen_counts[l] += 1

    peak_bitlen = np.argmax(bitlen_counts)
    pb_is = np.array([peak_bitlen-1, peak_bitlen, peak_bitlen+1])
    # Another weighted average, what a beautiful day!
    plausible_bitlen = np.sum(pb_is * bitlen_counts[pb_is])/np.sum(bitlen_counts[pb_is])

    bits = janky_decimate(bitstream, plausible_bitlen)
    Fs_return = Fs/plausible_bitlen
    
    return Fs_return, bits
In [457]:
baudrate, bits = extract_base_bitstream(bitstream, Fs_yp)
plot(bits)
title("%d per second" % int(baudrate))
Out[457]:
<matplotlib.text.Text at 0x7f12c2529150>

Now, to bundle it all up into the single hit decoder.

In [458]:
def decode_h(h):
    "Decodes the bitstream associated with Hit h, returns (baudrate,bitstream)"
    y = fetch_x(h.s_low(), h.s_length())
    Fs_yp, yp, _ = extract_channel(Fs, h.Fc, h.bw, y)
    a = specgram(yp, Fs=Fs_yp, NFFT=32, noverlap=31)
    clf()
    
    p_threshold = 0.98
    
    energies = a[0].flatten()
    energies.sort()
    
    threshold = energies[int(p_threshold*len(energies))]
    
    in_use = a[0] > threshold
    
    # Weighted mean
    mean_f = np.sum(np.sum(in_use, axis=1) * a[1])/np.sum(in_use)
    
    times_in_use = np.sum(in_use, axis=0) > 0
    head_skip = 0
    for t in times_in_use:
        if t:
            break
        head_skip += 1
        
    
    tail_skip = 0
    for t in times_in_use[::-1]:
        if t:
            break
        tail_skip += 1
    
    
    head_skip_Fs = int(head_skip * Fs / Fs_yp)
    tail_skip_Fs = int(tail_skip * Fs / Fs_yp)

    #########################
    # Re-extract the new data
    y = fetch_x(h.s_low()+head_skip_Fs, h.s_length()-tail_skip_Fs)
    Fs_yp, yp, _ = extract_channel(Fs, h.Fc+mean_f, h.bw, y)

    ##########################
    # Decode to the sample-rate bitstream
    lpf = scipy.signal.remez(20, [0, 0.2, 0.3, 0.5], [1,0])
    b = scipy.signal.lfilter(lpf, 1.0, decode_quad(yp, 10.0))
    bitstream = b > 0
    
    #########################
    # Decode to the base rate bitstream
    baudrate, target_bitstream = extract_base_bitstream(bitstream, Fs_yp)

    return baudrate, target_bitstream

i = 0
h = hl.merged[i]

baudrate, bitstream = decode_h(h)
plot(bitstream)
title(baudrate)
Out[458]:
<matplotlib.text.Text at 0x7f12c262cd50>

Let's finally run all of our packets through this thing! This might take a while, so we use a conservative approach, and log off our exceptions to find bugs/gotchas in the code above for later investigation. So long as we know which indexes caused the problems, we can always re-run it later to debug.

In [485]:
packets = []

errors = []
for i,h in enumerate(hl.merged):
    try:
        packets.append( [i, decode_h(h) ])
    except Exception, e:
        errors.append( [ i, str(e) ])
        #print str(i) + " Exception: " + str(e)

hist([ p[0] for p in packets]); None
len(errors)
8 Exception: Bands must be monotonic starting at zero.
9 Exception: attempt to get argmax of an empty sequence
13 Exception: Bands must be monotonic starting at zero.
36 Exception: Failure to converge after 25 iterations.
      Design may still be correct.
55 Exception: index 4 is out of bounds for size 4
64 Exception: Bands must be monotonic starting at zero.
65 Exception: Bands must be monotonic starting at zero.
66 Exception: index 5 is out of bounds for size 5
69 Exception: Bands must be monotonic starting at zero.
112 Exception: index 4 is out of bounds for size 4
120 Exception: Bands must be monotonic starting at zero.
124 Exception: Bands must be monotonic starting at zero.
140 Exception: Failure to converge after 25 iterations.
      Design may still be correct.
176 Exception: index 5 is out of bounds for size 5
177 Exception: Bands must be monotonic starting at zero.
178 Exception: Bands must be monotonic starting at zero.
181 Exception: Bands must be monotonic starting at zero.
187 Exception: Bands must be monotonic starting at zero.
194 Exception: Bands must be monotonic starting at zero.
196 Exception: Bands must be monotonic starting at zero.
232 Exception: index 5 is out of bounds for size 5
236 Exception: Bands must be monotonic starting at zero.
237 Exception: Bands must be monotonic starting at zero.
239 Exception: Bands must be monotonic starting at zero.
246 Exception: Failure to converge after 25 iterations.
      Design may still be correct.

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-485-87a17e85b0ab> in <module>()
      2 for i,h in enumerate(hl.merged):
      3     try:
----> 4         packets.append(decode_h(h))
      5     except Exception, e:
      6         print str(i) + " Exception: " + str(e)

<ipython-input-458-ac66f8b33ed3> in decode_h(h)
      3     y = fetch_x(h.s_low(), h.s_length())
      4     Fs_yp, yp, _ = extract_channel(Fs, h.Fc, h.bw, y)
----> 5     a = specgram(yp, Fs=Fs_yp, NFFT=32, noverlap=31)
      6     clf()
      7 

/usr/lib/pymodules/python2.7/matplotlib/pyplot.pyc in specgram(x, NFFT, Fs, Fc, detrend, window, noverlap, cmap, xextent, pad_to, sides, scale_by_freq, hold, **kwargs)
   3135              pad_to=None, sides='default', scale_by_freq=None, hold=None,
   3136              **kwargs):
-> 3137     ax = gca()
   3138     # allow callers to override the hold state by passing hold=True|False
   3139     washold = ax.ishold()

/usr/lib/pymodules/python2.7/matplotlib/pyplot.pyc in gca(**kwargs)
    801     """
    802 
--> 803     ax =  gcf().gca(**kwargs)
    804     return ax
    805 

/usr/lib/pymodules/python2.7/matplotlib/figure.pyc in gca(self, **kwargs)
   1219 
   1220         # no axes found, so create one which spans the figure
-> 1221         return self.add_subplot(1, 1, 1, **kwargs)
   1222 
   1223     def sca(self, a):

/usr/lib/pymodules/python2.7/matplotlib/figure.pyc in add_subplot(self, *args, **kwargs)
    912                     self._axstack.remove(ax)
    913 
--> 914             a = subplot_class_factory(projection_class)(self, *args, **kwargs)
    915 
    916         self._axstack.add(key, a)

/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in __init__(self, fig, *args, **kwargs)
   9257 
   9258         # _axes_class is set in the subplot_class_factory
-> 9259         self._axes_class.__init__(self, fig, self.figbox, **kwargs)
   9260 
   9261     def __reduce__(self):

/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in __init__(self, fig, rect, axisbg, frameon, sharex, sharey, label, xscale, yscale, **kwargs)
    447 
    448         # this call may differ for non-sep axes, eg polar
--> 449         self._init_axis()
    450 
    451         if axisbg is None:

/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in _init_axis(self)
    507         self.spines['bottom'].register_axis(self.xaxis)
    508         self.spines['top'].register_axis(self.xaxis)
--> 509         self.yaxis = maxis.YAxis(self)
    510         self.spines['left'].register_axis(self.yaxis)
    511         self.spines['right'].register_axis(self.yaxis)

/usr/lib/pymodules/python2.7/matplotlib/axis.pyc in __init__(self, axes, pickradius)
    652         self._minor_tick_kw = dict()
    653 
--> 654         self.cla()
    655         self._set_scale('linear')
    656 

/usr/lib/pymodules/python2.7/matplotlib/axis.pyc in cla(self)
    740         self._set_artist_props(self.label)
    741 
--> 742         self.reset_ticks()
    743 
    744         self.converter = None

/usr/lib/pymodules/python2.7/matplotlib/axis.pyc in reset_ticks(self)
    753         cbook.popall(self.minorTicks)
    754 
--> 755         self.majorTicks.extend([self._get_tick(major=True)])
    756         self.minorTicks.extend([self._get_tick(major=False)])
    757         self._lastNumMajorTicks = 1

/usr/lib/pymodules/python2.7/matplotlib/axis.pyc in _get_tick(self, major)
   1909         else:
   1910             tick_kw = self._minor_tick_kw
-> 1911         return YTick(self.axes, 0, '', major=major, **tick_kw)
   1912 
   1913     def _get_label(self):

/usr/lib/pymodules/python2.7/matplotlib/axis.pyc in __init__(self, axes, loc, label, size, width, color, tickdir, pad, labelsize, labelcolor, zorder, gridOn, tick1On, tick2On, label1On, label2On, major)
    140         self.tick1line = self._get_tick1line()
    141         self.tick2line = self._get_tick2line()
--> 142         self.gridline = self._get_gridline()
    143 
    144         self.label1 = self._get_text1()

/usr/lib/pymodules/python2.7/matplotlib/axis.pyc in _get_gridline(self)
    553                     linewidth=rcParams['grid.linewidth'],
    554                     alpha=rcParams['grid.alpha'],
--> 555                     markersize=0
    556                     )
    557 

/usr/lib/pymodules/python2.7/matplotlib/lines.pyc in __init__(self, xdata, ydata, linewidth, linestyle, color, marker, markersize, markeredgewidth, markeredgecolor, markerfacecolor, markerfacecoloralt, fillstyle, antialiased, dash_capstyle, solid_capstyle, dash_joinstyle, solid_joinstyle, pickradius, drawstyle, markevery, **kwargs)
    206         self.set_linestyle(linestyle)
    207         self.set_drawstyle(drawstyle)
--> 208         self.set_linewidth(linewidth)
    209         self.set_color(color)
    210         self._marker = MarkerStyle()

/usr/lib/pymodules/python2.7/matplotlib/lines.pyc in set_linewidth(self, w)
    787         ACCEPTS: float value in points
    788         """
--> 789         self._linewidth = w
    790 
    791     def set_linestyle(self, linestyle):

KeyboardInterrupt: 
<matplotlib.figure.Figure at 0x7f12c22f6210>
In [501]:
len(packets), len(hl.merged)
Out[501]:
(222, 933)

Let's save off our work thus far for later use, or use in other subprograms.

In [505]:
Fs
Out[505]:
4000000.0
In [506]:
full_capture_state = {}

# Some raw info about our input file
full_capture_state["filename"] = FILENAME
full_capture_state["sample_size"] = SAMPLE_SIZE
full_capture_state["Fs"] = Fs
full_capture_state["Fc"] = Fc

# Our hit lists
full_capture_state["raw_hits"] = [ { "Fc": h.Fc, "bw": h.bw, "s_low": h.s_low(), "s_length": h.s_length() } for h in hl.hits ]
full_capture_state["merged_hits"] = [ { "Fc": h.Fc, "bw": h.bw, "s_low": h.s_low(), "s_length": h.s_length() } for h in hl.merged ]

# And our packet list
full_capture_state["packets"] = [ { "hit_i": p[0], "baudrate": p[1], "bitstream": list(p[2].astype("int"))  } for p in packets ]

import time
filename = "capture_state_%d.json" % int(time.time())
print "Dumping state to the file %s" % filename
f = file(filename, "w")
json.dump(full_capture_state, f)
f.close()

s = os.stat(filename)
print "Got output file for %s: %d bytes long" % (filename, s.st_size)
Dumping state to the file capture_state_1421779406.json
Got output file for capture_state_1421779406.json: 682938 bytes long

In []:
 
In [487]:
plot([p[0]+1 for p in packets], [len(p[1])+1 for p in packets], 'o')
loglog()
xlim(8000,200000)
xlabel("Symbol rate")
ylabel("Rough packet length")
title("Packets observed approximately 10 seconds, 910~914MHz" )
In [489]:
plot([p[0]+1 for p in packets], [4e6*(len(p[1])+1)/p[0] for p in packets], 'o')
Out[489]:
[<matplotlib.lines.Line2D at 0x7f12c1c56510>]
In [490]:
def plot_packet(i,p):
    plot(p[1])
    ylim(-0.1,1.1)
    title("%d: %d" % (i, int(p[0])))
In [490]:
 
In [497]:
for i,p in enumerate(packets):
    try:
        plot_packet(i, packets[i])
    except:
        print "Exception at " + str(i)
    savefig("packet_%d.png" % i)
    clf()
Exception at 11
Exception at 60
Exception at 113
Exception at 165
Exception at 214
Exception at 216

<matplotlib.figure.Figure at 0x7f12c1871e50>
In []: