SoFee 2.0


From past few months I have not published publicly, but in drafts I was writing a small story. It took long to put together everything, and I now have a rough draft of story in place. But it is still not finished finished. I think it is same as personal software projects, experiments, never ending. There is always something to improve, fix, refactor/rewrite.

On that idea of on-going projects I will be picking up SoFee. With same features which I was aiming with first version. I want to make them modular which can work with each other on need to need basis.

I was also thinking of using Clojure this time. On that Punchagan correctly reminded me how fixing that could be counter productive to the project. In my first attempt for SoFee, I had decided to use python3 and many of web-page parsing libraries were still using python2. I spent long time to port abandoned archiving library of Warc to python3 and in the end that feature was not even shipped. So despite the temptation, as Punchagan suggested, it is best to look for best library available irrespective of language and put together a minimum feature, BUT complete module which can:

  1. Archive a link locally.
  2. Revisit those archives without an internet connection.
  3. Index the archives, make them searchable.
  4. Possibly a command line utility which can be extended with REST endpoints.

After that, I will pick up remaining features and try to build this, block by block.

One block at a time.

Youtube: a (healthy?)supplement for study material


In the past, I have visited NID to help out students with their Diploma projects. At times some students will show up with a buggy code whose source would be a youtube video. It always got me baffled. Students were trying to short the learning and they were writing code by pausing a video at a certain time and hoping that, that code would work for them just like it is shown in the video. There are so so so many issues with youtube as a reference for learning material. Something like:

  1. With programming, it can be particularly hard to cite and use the exact time-frame as reference. The video content and voice are not indexed. One can't find its way back to the video without a proper key word which is often limited.
  2. Hard to the reproduce environment which the presenter is using in the video. Often exact versions, OS details or other needed setup are missing from the 'details' section.
  3. Videos can get blocked for random copyright violations or even get taken down.

Having said that, recently, I got myself enrolled in an LLB course in a local government college. The college is pretty relaxed about the schedule, curriculum, assignment, and time-commitments. Regular classes happen but students are not "required" to attend them. There were very few basic requirements or assignments. I myself attended 4 class in all and one among them was right before the day exam was scheduled. There was some confusion around the optional subject and the class was meant for giving a rough overview of the syllabus. The downside of this approach was to self-study most of the things and score basic passing marks(half-assing a profession degree). I was mostly relying on last few years solved question paper to get an understanding of the subjects. Some teachers also helped out in the process. But, I think the most crucial part of the preparation was youtube videos, made by some random teachers, students, and professionals. They had recorded many small videos covering basics of certain Acts, where they applied, relevant cases and exceptions.

Listing down some references which were helpful(Shoutouts):

While these videos were really helpful, I still think for serious learning they are not good material compared to books, bare-acts, manuals. Such content can be referred back and revisited quickly.

Last minute...


Yesterday night I had a train to catch for Delhi. Train's departure time was ~10PM and that late, its hard to get autos to the station in Bikaner. So I reached early, almost an hour before. Generally I have a SMS from IRCTC(Indian online railways booking system) as handy reference for seat details and as a ticket. But the system can be unreliable at times and I didn't get the SMS when I booked my tickets. I remembered the coach and seat number, so I reached in front of my coach and kept goofing around, chatting with friends. The coach was still locked from inside, ground staff generally opens them around half an hour before the scheduled departure. Roughly around 9:45PM they opened it and I boarded the train to put my luggage. Whole coach was empty. As I took my seat, got comfortable, I thought, lets get ticket SMS, I will need it to show to the Ticket Checker.

I logged into the IRCTC, went to my "Booked tickets History" section, to the ticket, "GET SMS", and click. I looked at mobile in the anticipation of notification and soon there was, "Tingggggggggg". I checked message summary on phone and noticed DEE-BKN instead of expected BKN-DEE. I was like, "Hain?". I confirmed, in SMS, on IRCTC platform and I looked around at empty bogy, realising, shit, I booked wrong ticket. Time get a ticket in "Current booking", I was already logged in. I tried to book ticket for train, but system wasn't allowing to book online. I will now have to rush to ticket counter and get a ticket from there. Picked up my luggage, rushed to counter on the platform and person sitting there said, you can't get reservation ticket from here, you can get it only from the counter on first platform and hurry, even that counter closes at 10PM. I rushed, to first platform and one of its counter where the person was playing PubG on his mobile. I said, I want ticket for Delhi and he replied no tickets are available. I confirmed again, in all the classes? He replied seats are available only in 2nd AC. I knew he was lying but I had no choice, I said, please, can you give me one? He handed me a form to fill. I started doing that and I also tried to cajole him to start process to speed things up but he didn't budge. Eventually I gave him the form, money, exact change and he gave me the ticket and I walked out at 9:57 PM with a confirmed ticket for Delhi. Phew.

Unittest objects available on DBus: part 2


This is a follow up of my previous post. Basically I am learning things about mock library and DBus as we are removing technical debt, addressing our pain points and making product stable.

Partial mock DBus to unittest Object exported over DBus

last time I wrote about the pain to test object available on DBus. I got it reviewed with my teammates. As I paired with one, he agreed that with such a setup, these tests are not unittest but more like system or integration test.

First thing, let me share how I am exporting Object on dbus:

#!/usr/bin/env python3

import dbus
import dbus.service
import json

INTERFACE_NAME = "org.example.ControlInterface"

class ControlObject(dbus.service.Object):
    def __init__(self, bus_name, conf):
	super().__init__(bus_name, "/org/example/ControlObject")
	self._conf = conf
	print("Started service")

    @dbus.service.method(INTERFACE_NAME,
			 in_signature='s', out_signature='b')
    def Update(self, conf):
	try:
	    self._conf = json.loads(conf)
	except json.JSONDecodeError:
	    print('Could not parse json')
	    raise

    @dbus.service.method(INTERFACE_NAME,
			 in_signature='s', 
			 out_signature='s')
    def Get(self, key=''):
	if key == '':
	    # we can have empty strings as key to dict in python
	    raise KeyError
	try:
	    components = _conf[key]
	except (KeyError, TypeError):
	    raise
	else:
	    try:
		component = next(c for c in components if c['id'] == component_id)
	    except StopIteration:
		raise
	    else:
		return json.dump(component)

This Object takes a bus_name argument for initializing:

try:
    bus_name = dbus.service.BusName("org.example.ControlInterface",
				    bus=dbus.SystemBus(),
				    do_not_queue=True)
except dbus.exceptions.NameExistsException:
    logger.info("BusName is already used by some different service")
else:
    ControlObject(bus_name, {})

This way of setting up things coupled my Object to DBus setup tightly. I have to pass this bus_name as argument. As I was getting it reviewed with another of my colleague, he mentioned that I should be able to patch dbus and possibly get around with way I was setting up system with DBus, export object to it and then run test.

I had used partial mocking using with construct of mock, I put together following test using it:

from unittest import mock
from unittest import TestCase
import json
from service import ControlObject

class TestService(TestCase):
    def setUp(self):
	with mock.patch('service.dbus.SystemBus') as MockBus:
	    self.obj = ControlObject(MockBus, {})

    def tearDown(self):
	del self.obj

    def test_object_blank_state(self):
	self.assertFalse(self.obj._conf)

    def test_object_update_random_string(self):
	exception_string = 'Expecting value'
	with self.assertRaises(json.JSONDecodeError) as context:
	    self.assertFalse(self.obj.Update(''))
	self.assertIn(exception_string, context.exception.msg)
	self.assertFalse(self.obj._conf)

	with self.assertRaises(json.JSONDecodeError) as context:
	    self.assertFalse(self.obj.Update('random string'))
	self.assertIn(exception_string, context.exception.msg)
	self.assertFalse(self.obj._conf)

    def test_object_update(self):
	conf = {'id': 'id',
		'name': 'name',
		'station': 'station'}
	self.obj.Update(json.dumps(conf))

	self.assertTrue(self.obj._conf)
	for key in conf:
	    self.assertTrue(key in self.obj._conf)
	    self.assertEqual(conf[key], self.obj._conf[key])

This test worked directly and there was no need to, setup dbus-session on docker image, run a process where I export the object and call methods over DBus. Instead now, I can directly access all attributes and methods of the Object and write better tests.

$ python -m unittest -v test_service.py
test_object_blank_state (test_service.TestService) ... ok
test_object_update (test_service.TestService) ... ok
test_object_update_random_string (test_service.TestService) ... Could not parse json
Could not parse json
ok

----------------------------------------------------------------------
Ran 3 tests in 0.002s

OK

Another way(Better?) to export Object over DBus

While writing this post and exploring examples from dbus-python I found another way to write class which can be exported over DBus:

class ControlObject(dbus.service.Object):
    def __init__(self, conf, *args, **kwargs):
	super().__init__(*args, **kwargs)
	self._conf = conf
	print("Started service")

Now we don't even need to claim BusName and pass it as argument to this class. Instead we can make this Object available on DBus by:

system_bus = dbus.SystemBus()
bus_name = dbus.service.BusName('org.example.ControlObject',
				system_bus)
ComponentsService({}, system_bus, '/org/example/ControlObject')

And when we don't want to use DBus and just create instance of this Object we can directly do that also, by calling ComponentsService({}) directly. With this way of initializing, we don't need to partial mock DBus and write unittest directly.

Unittest object/interface available on DBus


There is better way to do this.

This is in continuation of one of my old post about writing unittests and mocking. In this post we will cover three points:

Write unittest for a DBus Object

Ideally an object exposed over DBus, is a regular object which can be created, accessed and tested like any other normal class in python. We had created our object/interface based on examples in dbus-python and from this blog post series. There DBus is very inherently coupled with the class making it impossible to create independent object. Like for example, I am using special decorators to expose methods over DBus. Because of this limitation, while writing unittest for this class, we ran into unique situation where, DBus service has to be running while we test.

Furthermore, DBus objects needs an ever running eventloop over which they are made available. dbus-python uses GLib main-loop, so we need to figure out a way by which, when we run tests, we are able to start this eventloop, make our object available over it, and then run unittest against it. While looking for answer StackOverflow came to rescue and I came across this thread and one of participant whose answer/comment contributed to the final solution says:

This is easy, not hard. And you MUST do it. Don't ever skimp on unit test just because someone tells you to, and it is really depressing that people give this kind of advice. Yes you should test your stuff without dbus, and yes you should test it with dbus.

The solution is to starts an independent process where DBus object is initialized and connected to eventloop running in that process, before running the unittest. Here is sample code, similar to solution suggested in StackOverflow:

class TestServices(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
	# we start eventloop and make our class available on it
	cls.p = subprocess.Popen(['python3', '-m', test_services', 'server'])
	# This was needed to wait for service to become available
	time.sleep(2)
	assert cls.p.stdout == None
	assert cls.p.stderr == None

    @classmethod
    def tearDownClass(cls):
	# This is needed to clean up event loop we started in
	# setUpClass
	os.kill(cls.p.pid, 15)

    def setUp(self):
	bus = dbus.SessionBus()
	handler = bus.get_object("example.org",
				 "/example/org/DemoService")

    def test_add_component_random_strings(self):
	success, message = self.handler.demo_method('random string')
	self.assertFalse(success)

if __name__ == '__main__':
    arg = ""
    if len(sys.argv) > 1:
	arg = sys.argv[1]
	if arg == "server":
	    loop = GLib.MainLoop()
	    DBusGMainLoop(set_as_default=True)
	    bus_name = dbus.service.BusName("example.org",
					    bus=dbus.SessionBus(),
					    do_not_queue=True)
	    DemoService(bus_name)
	    try:
		loop.run()
	    except KeyboardInterrupt:
		pass
	    loop.quit()
	else:
	    unittest.main()

Getting dbus running on Travis-CI

We have Travis-CI and docker setup for running tests. With docker as I tried to run tests, it failed with:

======================================================================
ERROR: test_add_component_random_strings (test_services.TestServices)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/app/test_services.py", line 135, in test_add_component_random_strings
    bus = dbus.SessionBus()
  File "/usr/lib/python3/dist-packages/dbus/_dbus.py", line 211, in __new__
    mainloop=mainloop)
  File "/usr/lib/python3/dist-packages/dbus/_dbus.py", line 100, in __new__
    bus = BusConnection.__new__(subclass, bus_type, mainloop=mainloop)
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 122, in __new__
    bus = cls._new_for_bus(address_or_type, mainloop=mainloop)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NotSupported: Unable to autolaunch a dbus-daemon without a $DISPLAY for X11

We are exposing our object over SessionBus, so we have to start DBus session-bus on docker image to run our unittest. I looked for other python based repository on github using DBus and passing travis-ci tests. I came across this really well written project: pass_secret_service and in its Makefile we found our solution:

bash -c 'dbus-run-session -- python3 -m unittest -v'

dbus-run-session starts SessionBus on docker image and that's exactly what we wanted. Apart from this solution, this project had even more better and cleaner way to unittest, it has a decorator which takes care of exporting the object over DBus. So far, I wasn't able to get this solution working for me. The project uses pydbus instead of dbus-python, ideally I should be able to shift to it, but will have to try that.

Mock object which are accessible to DBus object

Generally when we need to mock a behaviour, we can use patch decorator from mock library and set relevant behaviour(attribute of return_value or side-effect). But given the peculiarity of above setup, tests are running in a different process. So mocking behaviour around the unittest won't work, because DBus object is in different process and it won't have access to these mocked objects. To get around this we will need to mock things just before we start the MainLoop and create DemoService object:

with mock.patch('hue.Bridge') as MockBridge:
    with mock.patch('configparser') as mock_config:
	with mock.patch('requests') as mock_requests:
	    MockBridge.return_value.get_light.return_value = lights_dict
	    loop = GLib.MainLoop()
	    DBusGMainLoop(set_as_default=True)
	    bus_name = dbus.service.BusName("example.org",
					    bus=dbus.SessionBus(),
					    do_not_queue=True)
	    DemoService(bus_name)
	    try:
		loop.run()
	    except KeyboardInterrupt:
		pass
	    loop.quit()

Rushing in with bug fixes


After the release of integration of Nuimo-Click with Senic Hub, we came across this bug where if a paired Sonos speaker went down(unplugged, IP changes), both Nuimo-Control and Nuimo-Click will become unresponsive. Nuimo-Control shows an X, on its LED display matrix when something goes wrong.

404.jpg

Look and feel of Nuimo-Click is very similar to traditional switches, which rarely malfunction. While carrying that impression(expectation that it will always work), when user realizes that click is not working, it irks.

nuimo-click.jpg

We are using DBus for managing connection between smart home devices and the controllers. For communicating with Sonos we use websockets. In above mentioned situation, when Sonos was not reachable, there was no timeout for such requests and senic-core DBus API throws following error:

nuimo_app ERROR [MainProcess-ipc_thread][dbus.proxies:410] Introspect error on :1.16:/ComponentsS
ervice: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy bl
ocked the reply, the reply timeout expired, or the network connection was broken.

Given that there was no timeout, this error/exception is not instant, it takes a while for the DBus API to raise it. And in the meantime, hub remains in a state of suspension. Also, events are queued for processing, so this issue was also clogging up events. As mentioned earlier, a bad UX, sort of show-stopper, and on top of that we were on tight schedule to ship. I rushed in with first PR, adding timeout for all DBus method calls. All the places where I was calling DBus methods, I wrapped them around try, except block, pass extra timeout argument and handle the DBusException. This fix, although it worked, was HUGE. Changeset was sprawling through lot of files. It was not comforting to get it reviewed thoroughly and to make sure that I was not introducing new regression(we are still in process of adding more unittests to the stack, maybe will write a post around it).

Initially when we were working on design of senic-core, we had decided to handle exceptions at their point of origin. With respect to that decision, my first PR(impulsive fix), was in totally wrong direction. While I waited for reviews and comments, I started looking into websocket library, which was waiting forever for reply from a host which was not reachable. As I googled for terms like websocket client, timeout, I quickly came across lot of links and pointers on how to set a timeout for a websocket connection. This was promising. I quickly put together something as simple as:

# library creating websocket connection
REQUEST_TIMEOUT = 5
ws = websocket.create_connection(
	    url, header=headers,
	    sslopt=sslopt, origin=origin
	)
ws.settimeout(REQUEST_TIMEOUT)

# connection instance will now raise WebSocketTimeoutException

Sonos library interaction already had set of try, except blocks, I added WebSocketTimeoutException to those exceptions and handled it there itself. This fix was small, precise and it was comforting in a way. I tested out the fix, unplugging sonos speaker and interacted with Nuimo-Control and within 5 seconds(timeout), I noticed X on it. I additionally confirmed that system was able to work with Hue and interactions weren't clogging up. It was easier to get it reviewed from colleague, got it verified and merge.

SBjpgScaled.jpg

At times, symptoms can be distracting and putting a bandage won't fix the leak. Don't rush, take your time, identify root of the problem, and then fix it.

This situation also made us think about how to improve the way we are using DBus APIs. We had put together the first working version of API following a blog series around the same subject and from examples which were shipped with dbus-python (Python bindings for D-Bus) package. There is lot of room for improvement. We tried to understand better on how to use these APIs, document things and stress test them. I will write about them too, sometime.

PS: The included pencil sketch work was done by Sudarshan

Lets unittest using Python Mock, wait, but what to Mock?


At Senic, on our Hub, for managing applications, we use Supervisord. I am not sure about its python3 compatibility, but it is one of the reason we still have dependency on Python2.7 Given that Python2.7 life support clock is ticking, we recently merged big changes to use Systemd instead. I came across this small, clean, Python API for managing systemd services. We included it in our stack and I wrote a small utility function for it:

import logging
from sysdmanager import SystemdManager
import dbus.exceptions


logger = logging.getLogger(__name__)

def manage_service(service_name, action):
    '''This function is to manage systemd units passed to it in
    service_name argument. It will try to stop/start/restart unit
    based on value passed in action.
    '''
    try:
	systemd_manager = SystemdManager()
    except dbus.exceptions.DBusException:
	logger.exception("Systemd service is not accessible via DBus")
    else:
	if action == "start":
	    if not systemd_manager.start_unit(service_name):
		logger.info("Failed to start {}".format(service_name))
	elif action == "stop":
	    if not systemd_manager.stop_unit(service_name):
		logger.info("Failed to stop {}".format(service_name))
	elif action == "restart":
	    if not systemd_manager.restart_unit(service_name):
		logger.info("Failed to restart {}".format(service_name))
	else:
	    logger.info(
		"Invalid action: {} on service: {}".format(action, service_name)
	    )

With this in place, manage_service can be imported in any module and restarting any service is , manage_service('service_name', 'restart'). Next thing was putting together some unittests for this utility, to confirm if its behaving the way it should.

This smallish task got me confused for quite some time. My first doubt was how and where to start? Library needed SystemBus DBus, Systemd's DBus API to start, stop, load, unload systemd services. I can't directly write tests against these APIs as they would need root privilege to work, additionally, they won't work on travis. So, I realized, I will need to mock, and with that realization came second doubt, which part to mock? Should I mock things needed by library or should I mock library? When I looked for mocking Systemd APIs on DBus via dbus-mock, I realized this can become too big of task. So lets mock library object and functions which gets called when I call the utility function manage_service. I had read/noticed python's mock support, and while trying to understand it, it came across as a powerful tool and I remembered Uncle Ben has once rightly said, with great power comes great responsibility. At one point, I was almost convinced of hijacking the utility function and having asserts around different branching happening there. But soon I also realized it will defeat the purpose of unit-testing the utility and sanity prevailed. After looking around at lots of blogs, tutorials and conversations with peers, I carefully mocked some functions from SystemdManager, like stop_unit, start_unit, which gets internally called from the library and that way I was able to write tests for different arguments which could be passed to manage_service. At the end the tests looked something like this:

import unittest
from systemd_process_utils import manage_service
from systemd_process_utils import SystemdManager
from unittest import mock

class TestSystemdUtil(unittest.TestCase):
    service_name = "service_name"
    @mock.patch('senic_hub.commons.systemd_process_utils.SystemdManager')
    def test_manage_service(self, mock_systemd):
	# When: start_unit works, it returns True
	mock_systemd.return_value.start_unit.return_value = True
	manage_service(self.service_name, "start")
	mock_systemd().start_unit.assert_called_with(self.service_name)

	# When: start_unit fails, returns False
	mock_systemd.return_value.start_unit.return_value = False
	manage_service(self.service_name, "start")
	mock_systemd().start_unit.assert_called_with(self.service_name)

	# When: stop_unit works, returns True
	mock_systemd.return_value.stop_unit.return_value = True
	manage_service(self.service_name, "stop")
	mock_systemd().stop_unit.assert_called_with(self.service_name)

if __name__ == '__main__':
    unittest.main()

SoFee


After finishing my work with TaxSpanner, I had worked on a personal project, SoFee for around six months. In this post I am documenting idea behind it and what I had expected it to become and where I left it off(till now).

Features

RSS Feed

I have realized that in many of my personal projects, I work broadly around archiving the content I am reading and sharing online, be it news articles, or blogs, or tweet threads or videos. This form of data, feels like sand which keeps slipping away as I try hold it, to keep it fresh, accessible and indexed for reference. And it got triggered after the sunset of Google Reader. Punchagan had introduced me to google reader and I had soon started following lot of people there and used its browser extension to archive the content I was reading. In some way, with SoFee, I was trying to recreate Google Reader experience with the people I was following on twitter. And first iteration of the project was just that, it would give an OPML file which could be added to any feed-reader and I will get separate feed of all the people I am following.

Archiving the content of links

While taking out my feed and data from google-reader, I also noticed that it had preserved content of some of the links. When I tried to access them again, some links were made private and some were no longer available(404). While working on SoFee, I came across the term link-rot and I thought this aspect of preserving the content is crucial, I wanted to archive the content, index it and make it accessible. Often times, I learn or establish some of facts, while reading this content and I wanted it to be referable so that I can revisit it and confirm the origins. I noticed firefox's reader-mode and used its javascript library, Readablity, to extract cleaned up content from the links and add it to the RSS feed I was generating. I also came across Archive.org's Web ARChive/WARC format for storing or archiving web-pages and project using it. I wanted to add this feature to SoFee, so that pages no longer go unavailable and there is individual, archived, intact content available to go back to. In the end after looking at/trying out/playing with different libraries and tools I wasn't able to finish and integrate it.

Personally Trained Model

Last feature which I had thought of including was a personally trained model which can segregate these links into separate groups or categories. Both Facebook and twitter were messing around with timeline, I didn't want that to happen to mine. I wanted a way to control it myself, in a way which suited me. For first step, I separated my timeline from twitter into tweets which had links and others which were just tweet or updates. Secondly, I listed all these tweets in chronological order. With content extracted using Readability, I experimented with unsupervised learning, KMeans, LDA, visualization of results, to create dynamic groups, but results weren't satisfying to be included as feature. For supervised learning, I was thinking of having a default model based on Reddit categories or wikipedia API which can create a generic simpleton data set and then allow user to reinforce and steer the grouping as their liking. Eventually allow users to have a private, personal model which can be applied to any article, news site or source and it will give them the clustering they want. Again, I failed in putting together with this feature.

What I ended up with and future plans

Initially, I didn't want to get into UI and UX and leave that part on popular and established feed-readers. But it slowed down user onboarding and feedback. I eventually ended up with a small web interface where the links and there content were listed and timeline was getting updated every three hour or so. I stopped working on this project as I started working with Senic, and the project kept working for well above an year. Now its non-functional, but I learned a lot while putting together what was working. It is pretty simple project where we can simply charge user the fee for hosting their content on their designated small vps instance or running a lambda service(to fetch updated timeline, apply their model to cluster data), allow them full control of their data(usage, deletion, updation). I will for sure use my learnings to put together more complete version of project with SoFee2.0, lets see when that happens(2019 resolution?).

Bitbake recipes: part 2


Continuing from last post

Title of issue was: "Add leveldb to senic-os"

Now as we got leveldb python library installed and we started implementation/using it. We have multiple applications/process accessing DB and we ran into concurrency issues, we tried couple of things, like making sure every access opens DB connection and closes it but it didn't pass multiprocessing test,

import unittest
import os
import random
import string
from multiprocessing import Process

class TestDevicesDB(unittest.TestCase):
    def setUp(self):
	self.db = DbAPIWrapper('/tmp/test_db.db')
	self.value = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(10))

    def test_multipe_instance_access_to_db(self):
	self.db.write("key", self.value)
	self.assertEqual(self.db.read("key"), self.value)
	p = Process(target=update_db)
	p.start()
	p.join()
	self.assertEqual(self.db.read("key"), "new value")
	self.db.delete("key")
	self.assertEqual(self.db.read("key"), '')
	p = Process(target=update_db)
	p.start()
	p.join()
	self.assertEqual(self.db.read("key"), "new value")

leveldb documentation mentions that DB can be opened by only one process at a time and writing wrapper to confirm this condition(locks, semaphores) with python seemed a bit of too much work. We got a DB and its bindings setup on OS, but weren't able to use it. Back to square one.

This time, we thought of confirming that Berkeley DB indeed supports multiple process accessing DB. For Raspbian, there is package available for `berkeledb` and its python bindings, `bsddb3`. I installed them on a raspberry Pi, confirmed that above tests work and even accessing with multiple python instance accessing same DB and reading/writing values. Once this was confirmed, we knew that this DB will work for our requirements so we again resumed on getting the dependencies sorted out on Senic-OS.

Title of issue: "setup BerkeleyDB and bsddb3 on senic-os"

First thing to sort out on this task was how to get `berkeledb` binary tools on senic-os. Library was there, `libdb.so` but binary tools were not there. Some email threads mentioned about installing `db-bin` but running `bitbake db-bin` threw error that, `nothing provides db-bin`. I again reached out on IRC channel there good folks pointed me to this section of recipe mentioning that binaries would be directly installed on os if I included `db-bin` in list of packages. First step sorted out \o/

Second thing, though there was `libdb.so` file, somehow `bsddb3` recipe was not able to locate it. After some better insights on recipes from the work done on `leveldb`, I again looked at the initial recipe I had put together. I had to figure out how to pass extra arguments to `setup` tools of python, giving it location of folders where it can find `libdb.so`. This was a right question to ask and search, bitbake documentation and google helped and finally with following recipe, I was able to get `bsddb3` installed on senic-os:

SUMMARY = "pybsddb is the Python binding for the Oracle Berkeley DB"
HOMEPAGE = "https://www.jcea.es/programacion/pybsddb.htm"
SECTION = "devel/python"
LICENSE = "BSD-3-Clause"
LIC_FILES_CHKSUM = "file://LICENSE.txt;md5=b3c8796b7e1f8eda0aef7e242e18ed43"
SRC_URI[sha256sum] = "42d621f4037425afcb16b67d5600c4556271a071a9a7f7f2c2b1ba65bc582d05"

inherit pypi setuptools3

PYPI_PACKAGE = "bsddb3"

DEPENDS = "db \
  python3-core \
"

DISTUTILS_BUILD_ARGS = "--berkeley-db=${STAGING_EXECPREFIXDIR}"
DISTUTILS_INSTALL_ARGS = "--berkeley-db=${STAGING_EXECPREFIXDIR}"

RDEPENDS_${PN} = "db \
  python3-core \
"

We baked Senic-OS with these libraries and dependencies and ran the multip-processing tests and confirmed that indeed we were able to access DB from our multiple applications. And with this, the task got marked done and new task opened up of migrating applications to use this DB wrapper instead of using config files.

Bitbake recipes: part 1


In this two part posts, I am trying to document the process of deciding on a issue, and how it evolves while trying to resolve or "close" it.

After initial release of Nuimo Click, we looked at the pain points we had on our backend stack. Currently we use lot of config file across processes, some applications write to them, others read/need them to connect to smart home devices. We use threads which "watches" changes to these config file so applications update themselves. This overall is becoming pain and also leaving us in nondeterministic states, which resulted in bad UX and is hard to reproduce and debug. We had thought of using Database or better key-value storage system for one sprint and I tried to take a shot at it.

Title of issue: "Add DB to senic-os"

Avichal my colleague, had already done some benchmarking with options available and I started with this reference point. There was initial consensus on using popular BerkeleyDB, said library(libdb.so) was already part of the SenicOS and we just wanted python3 library recipe for it to start using it.

Title of issue: "Add BerkeleyDB to senic-os"

I started exploring on how to get the library installed but got stuck with following bug:

user@senic-hub-02c0008185841ef0:~# ls -la /usr/lib/libdb-5.so 
lrwxrwxrwx    1 user     user            12 May 15 13:44 /usr/lib/libdb-5.so -> libdb-5.3.so
user@senic-hub-02c0008185841ef0:~# pip3.5 install bsddb3
Collecting bsddb3
  Downloading https://files.pythonhosted.org/packages/e9/fc/ebfbd4de236b493f9ece156f816c21df0ae87ccc22604c5f9b664efef1b9/bsddb3-6.2.6.tar.gz (239kB)
    100% |                                | 245kB 447kB/s 
    Complete output from command python setup.py egg_info:
    Can't find a local Berkeley DB installation.
    (suggestion: try the --berkeley-db=/path/to/bsddb option)

    ----------------------------------------

Small background, Yocto is custom embedded linux distribution which can be tailored for any SoC to create a small footprint, lightweight distribution. For libraries and packages which needs to be shipped with the OS, we need to write recipes. They are small snippet of config file, which defines the components, source of the package, dependency, tools needed to compile the recipe etc. Most/many packages already have recipes for them in layer-index, where they can be searched and integrated out of the box, but at times, we need to write one.

I am familiar with and writing/figuring out how to put together a recipe which should work but I needed some more understanding of internals. While python package, bsddb3 is actively maintained, Berkely DB itself was available for download behind oracle sign-in. I wasn't able to get these dependencies sorted out hence I started looking at alternatives.

leveldb was a good option, it is actively maintained, has a recipe available for its core package and its python library is also well maintained.

Title of issue: "Add leveldb to senic-os"

As I tried to put together recipe for leveldb, I got stuck on trying to figure out how to make cython compile header files from the library for the native platform we used. I reached out in IRC channel(#oe on irc.ubuntu.com) and shared my recipe and doubt, folks there helped me understand how to use, enable cython to compile for native platform. Here is how the final recipe looked like:

DESCRIPTION = "Plyvel is a fast and feature-rich Python interface to LevelDB."
HOMEPAGE = "https://github.com/wbolster/plyvel/"
SECTION = "devel/python"
LICENSE = "BSD"
LIC_FILES_CHKSUM = "file://LICENSE.rst;md5=41e1eab908ef114f2d2409de6e9ea735"
DEPENDS = "leveldb \
  python3-cython-native \
  python3-setuptools-native \
"

RDEPENDS_${PN} = "leveldb"

# using setuptools3 fails with make command
# python3native is needed to compile things using python3.5m
inherit setuptools python3native

S = "${WORKDIR}/git"
SRC_URI = "git://github.com/wbolster/plyvel.git;tag=1.0.5 \
  file://setup.patch \
"
PV = "git-${SRCPV}"

I needed to add a small patch to python package to get it compiled with python3:

diff --git a/Makefile b/Makefile
index 2fec651..2c2300a 100644
--- a/Makefile
+++ b/Makefile
@@ -3,8 +3,8 @@
 all: cython ext

 cython:
-	cython --version
-	cython --cplus --fast-fail --annotate plyvel/_plyvel.pyx
+	cython3 --version
+	cython3 --cplus --fast-fail --annotate plyvel/_plyvel.pyx

 ext: cython
	python setup.py build_ext --inplace --force
diff --git a/setup.py b/setup.py
index 3a69cec..42883c6 100644
--- a/setup.py
+++ b/setup.py
@@ -1,7 +1,10 @@
 from os.path import join, dirname
 from setuptools import setup
+from distutils.command.build import build
 from setuptools.extension import Extension
 import platform
+from distutils.command.install import install as DistutilsInstall
+from subprocess import call

 CURRENT_DIR = dirname(__file__)

@@ -14,6 +17,16 @@ def get_file_contents(filename):
	 return fp.read()


+class BuildCython(build):
+    def run(self):
+        cmd = 'make'
+        call(cmd)
+        build.run(self)
+        # do_pre_install_stuff()
+        # DistutilsInstall.run(self)
+        # do_post_install_stuff()
+
+
 extra_compile_args = ['-Wall', '-g']
 if platform.system() == 'Darwin':
     extra_compile_args += ['-mmacosx-version-min=10.7', '-stdlib=libc++']
@@ -53,5 +66,8 @@ setup(
	 "Topic :: Database",
	 "Topic :: Database :: Database Engines/Servers",
	 "Topic :: Software Development :: Libraries :: Python Modules",
-    ]
+    ],
+    cmdclass={
+        'build':BuildCython
+    }
 )

To be continued…