Nayee Taleem

Krunal was visiting Nayee Talim/नई तालीम from dec 12th to 21st and asked me to join for a visit and understand the school model. After the visit to bhasha institute it was one more opportunity to understand how alternate schools are working. The philosophy behind the school has been formulated long ago Gandhi, around 1937 and goes something like: learning by doing and using real world as classroom to understand different subjects. In 2005 the school was reopened in Anand niketan and since then have been trying many things to improve the learning environment and working really hard to keep it relevant in current times.

The school compound itself is really beautiful. campus.jpg

We reached sevagram on 11th and from next day krunal and preeti were facilitating a three day workshop for teachers to explore different ideologies to make the school environment more "children centered". Many models of existing schools from around the world and india were presented, discussed and participants were encouraged to reflect if there was a similar environment in Nayi Taleem and if not what were the reasons behind it, can they be adopted for this school and if yes, how.

In school campus, many methods, activities and tailored curriculum are used to teach science, maths and other subjects. They have farms, art installations, craft and sewing classes. Like in case of farming, each class is allotted a piece of farm and they are responsible for growing, maintaining and selling the produce. produce.jpg

A show and tell session going on, one of the teacher caught a cobra snake/naag in the campus, he showed kids and teachers how it looks, how to identify one and also mentioned how its bite could be lethal, later same teacher released the snake in remote outskirts of village. show_and_tell.jpg

For lunch, classes gets their turn and students along with the teacher prepare lunch for whole school. Basically in krunal's word Gandhi had an idea that kids should be made familiar with the tools and how to use them and that's what the school seemed to be doing.

Despite all this, where they(school-admin, teachers, students) lack is, with current technological developments these tools have changed drastically and many have been rendered obsolete. For computer education they are using standard curriculum which introduces kids with paint, word and other basic computing skills and that too very superficial. Computers, mobiles, "smart" devices are everywhere but without knowing how to use these tools people in locality become mere consumer of things which are getting actively developed somewhere else and might not be meant to used in village's local context and need. I personally think diversity is good, different solutions to problem/challenge brings better insight to the problem and even better solutions.

We were thinking on how to introduce computers with DIY approach so that children can learn these modern tools just like they are learning other tools. While returning krunal mentioned about conducting some engaging and fun session where we explore different themes(games, makers lab kind of setup). I have tried to do this before but one thing I have realized is that the process of making is slow and I don't exactly know how to take sound-bite out of them and conduct sessions around them. Also my knowledge of making itself is limited. I was thinking of doing some gaming session leading to designing a simple game level and then playing it or doing some more engaging sessions using arduino/mobiles. I have exchanged models/references with krunal which can be used(1, 2, 3, 4, 5, 6), I will try to explore more on this and see what can be used or developed for Nayee Taleem.

Personally I am confused and find myself severely incompetent for this particular task. On one side I can totally relate to attempts being made to improve AI, projects like home automation. I was reading one HN thread and there are situations where people are depending on alexa, google home for knowing about weather outside their homes. Somehow I can't relate to that vision where people are so unaware of their surroundings. While on the other hand, kids at Nayee Taleem and Bhasha Institute and many other such place are very aware of the environment, they care for it and nurture it but I am not confident if this narrative will hold on against this widespread and blinded adaption of technology. Only time will tell how all this pans out, but try we must.

Bhasha Institute Tejgarh

Thanks to ruby's relentless efforts(email chain spread over two months) we finally managed dates with Bhasha Institute

What is this Institute?

Ganesh N Devy setup the institute with the idea to provide tribal communities space where their art, culture, learning, knowledge of their environment could be celebrated, curated and preserved. It is located near the village of tejgarh, Vadodra, Gujrat. At the moment the administration and all operations are carried out by local people and for the local people.

What's the need?

Read and watch this piece on PARI about "tribal girls sing English songs in a village that doesn't speak English, in honor of the potato that they don't eat.". It very accurately depicts the broken education system. Founders and supporters of Bhasha recognized this issue from the start and focused on introducing formal education while keeping it relevant to their local context. Enabling locals to take thing forward, finding people who understand the local issues and are motivated to take the charge to find a possible solution.

How are they doing it?

Bhasha has different "verticals" and they organize workshops regularly for all of them to keep evolving and adapting with exchange of skills between locals and visitors. There is tribal museum for curating local arts, library/publications to document, preserve and publish local knowledge, small medical team with both allopathy and homeopathy treatments available for locals, Shaala aka school which works as a support system to get kids ready for mainstream schooling. In shaala they take local students(aged 8 to 12), belonging to mix of tribes speaking different dialects/languages/bhasha. They have multi-lingual teaching system to get students at ease with different dialects/languages and also introduce formal Gujrati in process to enable kids to read and write. Eventually after 2 years with help from institute they are admitted to schools. Apart from language they also get taught local skills related to farming, folk songs, their own culture.

What was I doing there?

Ruby, Praful and sanket had first hand experience with tribal education at school running in Nelgunda by Hemalksha we had lot of questions on how things were managed and core idea behind institute. There were reflections/discussions in terms of what is different between tribals of different regions and how bhasha as institute is trying to stay relevant. As for me it was mostly observing, institute, activities going on in campus, how ruby, sanket and praful were interacting with local kids(they taught kids two lines from a madia folk song). I wasn't able to contribute back to the local community during this stay, but next time for sure I will.

Using sleepTimeout in JavaScript

tldr: ALWAYS take a brief look at official developer documentation of functions.

I was trying to rush a release over weekend and I had requirement where I was to make repeated API calls to track progress of status of task. Without that any new user would be seeing a "dead page" without any info on what is going on and how he/she should proceed. Pseudo code would be something like:

function get_task_status(task_id) {
  $.get("/get_task_status/", {'task_id':task_id})
    .done( function(data) {
      // Update status div
      // Wait for x seconds and repeat this function

As usual I hurried to Google search the template/pointer code. StackOverflow didn't disappoint and I landed up with this discussion. It had decent insight on using callback function with setTimeout and I cooked up my own version of it:

function get_task_status(task_id) {
  $.get("/get_task_status/", {'task_id':task_id})
    .done( function(data) {
      if(data['task'] === 'PROGRESS') {
	Materialize.toast(data['info'], 1000);
	setTimeout(get_task_status(task_id), 2000);

Looks innocent right? Well that's what got me stumped for almost 3-4 hours. I tried this and my javascript happily ignored setTimeout and delay in seconds and kept making continues GET requests. I tried some variants of above code but nothing worked. Eventually I came across this post on SO, tried the code and it worked! I was convinced that there was some issue with older version handling of setTimeout and 2016 update is what I needed.

Today as I sat to put together a brief note of this experience I was testing setTimeout code on node, browser console, inside HTML template and somehow each time delay was working just fine:

function get_task_status() {
  setTimeout(get_task_status, 2000);
> function get_task_status(task_id) {
...   console.log(Date());
...   // Recursive call to this function itself after 2 seconds of delay
...   setTimeout(get_task_status, 2000);
... }
> get_task_status('something');
Tue Oct 25 2016 15:55:59 GMT+0530 (IST)
> Tue Oct 25 2016 15:56:01 GMT+0530 (IST)
Tue Oct 25 2016 15:56:03 GMT+0530 (IST)
Tue Oct 25 2016 15:56:05 GMT+0530 (IST)
Tue Oct 25 2016 15:56:07 GMT+0530 (IST)
Tue Oct 25 2016 15:56:09 GMT+0530 (IST)
(To exit, press ^C again or type .exit)

Again, this bummed me, I thought I had "established" that setTimeout was broken and promise is what I should be looking at and get better understanding of. While trying to work it out and understand what is wrong as I checked MDN documentation of the function and I finally realized my real bug. Syntax of function is var timeoutID = window.setTimeout(func[, delay, param1, param2, ...]);

And this is what I was doing: setTimeout(get_task_status(task_id), 2000);

Notice in syntax params are after the delay argument while I just used them directly and this was the small gotcha. I was talking to Syed ji about this experience and he pointed to You don't know series for better understanding of javascript concepts and nuances. I learned my lesson, properly RTFM and as for promise, I will return to learn more about it later, at the moment my code is working.


Yesterday punchagan introduced me with PEBKAC - Problem Exist Between Keyboard And Chair, it was a Things I learned(TIL). I had experienced this before many times, only I didn't know that there was such a term.

Starting in April, at TaxSpanner I was given task to integrate ITD Webservices APIs for income tax return filing and other features with our existing stack. The procedure included quite a few alien things to me. Sample codes provided by ITD were in Java, they were using SPRING framework. Our requests had to be routed via specific proxy approved from ITD and furthermore we physically needed an USB DSC key registered with ITD to encrypt the communication.

As I was trying to get first successful run of API working from my system, I was stuck with accessing DSC key from my java code. It needed drivers available here and java security(/etc/java-7-openjdk/security/ file to be edited properly to use correct drivers. After doing these things first thing I tried was to list certificates on the USB token using keytools. And on first run, it worked fine. I was ecstatic, one fewer unknown from the pile of unknowns, right. Wrong, as soon as I tried to run java programs using DSC it threw up lines and lines of error which went something like:

org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace []
Offending resource: class path resource [ClientConfig.xml]

	at org.springframework.beans.factory.parsing.FailFastProblemReporter.error(
	at org.springframework.beans.factory.parsing.ReaderContext.error(
	at org.springframework.beans.factory.parsing.ReaderContext.error(
	at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.error(
	at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.parseCustomElement(
	at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.parseCustomElement(
	at itd_webs.core.main(Unknown Source)
2016-05-11 12:03:48,114 [main] WARN  org.apache.cxf.bus.spring.SpringBusFactory -  Failed to create application context.
org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace []
Offending resource: class path resource [ClientConfig.xml]

Getting configs in place so that SPRING framework can load proper credentials from DSC was one another task where inputs from Nandeep proved very crucial. Thankfully we had one more system where the Java code and setup with the DSC worked. So it was clear that the particular error related to DSC recognition was just on my system. After lot of head scratching, comparing two systems to identify if something is amiss, trying strace, nothing helped. After scrolling through lot of java related stackoverflow conversations I was playing around with keytools and jdb. As I tried JDB, I noticed DSC blinking and then I thought that the maybe default java was using different configs, I checked my /usr/lib/jvm/ and indeed there were 4 different version of java. I checked java -version and it pointed to java version "1.8.065" so instead I tried to compile and run command using /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java and USB blinked happily ever after. While we kept developing and making system stable on computer where things were working, to get it working on my system it almost took two weeks to narrow down to exact issue. And now I have a name for all the time spent, PEBKAC. Thank you punchagan.

Cosine Similarity : Simple yet Powerful Algorithm

I came across Cosine Similarity when I was exploring document clustering (1, 1). It is very simple concept of measuring similarity between two vectors, and this vector can be n-dimensional.

Sometime back Nandeep took me along to NID to help out Masters students with their projects. Kalyani, one of the student was making a haptic device which can help students with hearing disabilities to practice and learn alphabets on their own. From her field visits she got to know that at the moment most of sessions are done in person with the trainer. In these sessions kids were feeling vibrations of trainer's throat to identify how to speak. Her project was around this concept of replicating these vibrations on a physical device along with an App which can "listen" to what students are saying and compare them against some standard audio samples of characters.

Nandeep quickly found this script which used Librosa to compare two audio samples. It was a good start, we were able to get some idea on how we can use these tools for our sample set. When we looked at returned values by MFCC we realized it was a vector. With DTW we tried few custom distance calculations like average, difference between max values etc but cosine similarity gave us the best results. We put together this script, tested it with different samples and checked the performance and kalyani was quite satisfied with it for her prototype.

We got quite lucky with this algorithm fitting in our problem scenario though simple online search engines might not have lead us in this direction.

Using Hekad to parse logs for relevant parts

We at Taxspanner have been looking at different options for analytics pipeline and setting up things to capture relevant information. Vivek had noticed Hekad and he was wondering if we could use syslogs which are already being generated by the app. Idea was to look for specific log in a format which could contain information like App name, Model, UUID and operation being done.

We followed basic setup guide to get a feel of it and it was processing nginx logs in tunes of millions very quickly. Apart from official documentation, this post talks about how to setup a quick filter around Hekad processing pipeline. We experimented with a client-server setup where client running on app server can tail the django log file, filter relevant log message and push it to server hekad instance aggregating logs from all app servers.

This was the client's side hekad config toml file:

maxprocs = 1
base_dir = "."
share_dir = "hekad-location/share/heka/"

# Input is django log file
type = "LogstreamerInput"
splitter = "TokenSplitter"
log_directory = "logs/"
file_match = 'django\.log'

# Decoder to parse logs and extracting relevant log
type = "SandboxFilter"
message_matcher = "Logger == 'django_logs'"
filename = "lua_decoders/django_logs.lua"

# Encoder for output streams

# We channel output generated from DjangoLogDecoder to a certain UDP port
message_matcher = "Logger == 'DjangoLogDecoder'"
address = ":34567"
encoder = "PayloadEncoder"

The Lua script to filter relevant log pretty small:

local string = require "string"
local table = require "table"

-- This structure could be used in better way
local msg = {
Timestamp   = nil,
Type        = msg_type,
Payload     = nil,
Fields      = nil

function process_message ()
    local log = read_message("Payload")
    if log == nil then
      return 0
    local log_blocks = {}
    for i in string.gmatch(log, "%S+") do
      table.insert(log_blocks, i)
    if table.getn(log_blocks) >= 4 then
      if log_blocks[3] == "CRITICAL" then
	msg.Payload = log
    return 0

With client instance in place now we get out listener config sorted out.

maxprocs = 4
base_dir = "."
share_dir = "hekad-location/share/heka/"


# Input listening to port 
type = "UdpInput"
address = ":34567"

# Output channels message received and just prints them
message_matcher = "Logger == 'app_logs'"
encoder = "PayloadEncoder"

And that's it, this will have a basic hekad based pipeline in place which can simply pick information from django logs.

Issues with Indexing while using Cassandra.

We have a single machine cassandra setup on which we are trying different things for analytics. One of Column family we have goes with this discription:

CREATE TABLE playground.event_user_table (
    event_date date,
    event_time timestamp,
    author text,
    content_id text,
    content_model text,
    event_id text,
    event_type text,
    PRIMARY KEY (event_date, event_time)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': ''}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
CREATE INDEX author ON playground.event_user_table (author);

With this table, we populated data in it for different apps/models. Now when we query system with something like:

cqlsh:playground> select * from event_user_table where event_date = '2013-06-02' ;
 event_date | event_time               | author                                 | content_id                                      | content_model | event_id | event_type
 2013-06-02 | 2013-06-02 00:00:00+0000 |        |        |           A   |          |  submitted
 2013-06-02 | 2013-06-02 01:28:13+0000 |      |                                      1000910424 |           B   |          |     closed
 2013-06-02 | 2013-06-02 01:59:31+0000 |         |         |           A   |          |    created
 2013-06-02 | 2013-06-02 02:00:44+0000 |            |            |           A   |          |    created
 2013-06-02 | 2013-06-02 02:02:16+0000 |       |       |           A   |          |    created

Result looks good and as expected. Now I query system on the secondary index of author and I get empty or partial results:

cqlsh:playground> select * from event_user_table where author = '' ;
 event_date | event_time | author | content_id | content_model | event_id | event_type

(0 rows)

cqlsh:playground> select * from event_user_table where author = '' ;
 event_date | event_time               | author                             | content_id | content_model | event_id | event_type
 2014-01-18 | 2014-01-18 09:01:52+0000 | | 1001068325 |           SRF |          |     closed

(1 rows)

And I have tried this combinations of PRIMARY KEY too ((event_date, event_time), author) but with same results. There are known issues with secondary indexes and scaling1 but it affects single node systems too? I am not sure about it. Time to confirm things.

Update1 <2016-02-10 Wed 15:45>: As mentioned here2, Cassandra has "'lazy' updating to secondary indexes. When you change an indexed value, you need to remove the old value from the index." Could that be the reason?

Extending Zulip to provide Chat-With-Us - Helpdesk interface.

Here is the github repo:

We at TaxSpanner had been using olark to help out our customers over chat interface. While there interface is very mature and gives all the features they mention, we were not able to manage things in terms of how customer was reaching out us via different mediums(email, chat, phone). As zulip got released and we were giving it a try for our internal team communication(both tech+sales team) idea was floated that could we extend the same interface to our customers too?

We set the target to get a prototype in place which could replace olark and if during trails feature requests are reasonable and manageable we would take a call. So idea was to expose a limited view to customers while having full feature interface for our Support Team. We created two stripped down and simple html templates which would be exposed to customers. Got a view in place in lines of home view of zulip with additional logic to route customers to sales team. Notifications for offline and online sales team, additional small checks in main zulip views to make sure basic views are exposed just to internal team.

Like function get_status_dict in zerver/libs/ there is a hook for MIT users,

# Return no status info for regular users
if requesting_user_profile.realm.domain != settings.ADMIN_DOMAIN:
    return defaultdict(dict)

and additional check to home in views:

from zerver.forms import has_valid_realm
if not has_valid_realm(
    from django.contrib.auth import logout
    # making sure user is logged out from session.
    return redirect('/support')

For adding this interface on landing page we added following HTML

<div id="zulip" style="bottom: 0px; right: 0px; position: fixed; z-index: 9999;" >
    <div style="width: 200px; height: 32px;" id="zulip-chat">
	    <span class="icon--speech-bubble"></span>
	    Chat with Us!
      <iframe id="zulip-iframe" height="350px" width="450px" style="border:1px solid gray; display: none;" src=""></iframe>

and this java-script code to handle user clicks

  $('#zulip-chat').on('click', function(e) {
  if ($('#zulip-iframe').is(':hidden')) {
    $('#zulip-iframe').attr('src', 'https://url-to-zulip-instance');

At the moment we auto-login customer after asking their email-id. For security purpose, we create a new private stream for the customer and unsubscribe it form previous existing streams. Ideally there should be a way(OAuth or server-to-server authentication) to make sure user is logged in on main site and then enable previous history of chats they had done.

I am not exactly sure about this (ab)Use-case of Zulip and quite possibly there could be something which I am overlooking/missing. But zulip has a really strong chat interface around which if we can integrate our APIs, it will give us a lot of control. Adding bots, notification, some simple intelligence and developing things on top of it could enable and extend the existing web application in lot of ways.

Setting up Zulip with Docker

UPDATE <2015-12-20 Sun>: Zulip repo itself now has docker support which is much cleaner than what I had done.

UPDATE <2015-11-15 Sun>: webpack with this config is not working neatly so while running docker image checking out revision ae04744

After attending RC, punchagan had mentioned about Zulip quite a few times on google-talk based bot which we use for communication. There were few discussions among the group about trying something else like Slack or other tool which could cater to group spread across different channels(watsapp, hangout, legacy google-talk) along with rich features like sharing photos, docs etc. As Zulip got released sometime back we were excited to give it a try.

It got released on Friday so we thought of giving it a shot over the weekend and try to get something with which we can play around. As we were looking at initial documentation it involved lot of sudos for setup. While we had access to server we didn't want to experiment things on system level, so we thought of trying Docker. With a sample ubuntu image we quickly got all dependencies installed and were able to run file.

To get a "standard" setup we thought we will try to run different services on different containers like instructed here. But soon we ran into issue of manually editing code in zulip setup to initialize different services(DB, rabbitmq etc). And I think I was not doing it the right way, I was installing most of packages in all containers. So we reverted to having a Dockerfile with one container having complete setup.

Now there is PR related to Docker setup for Zulip too. But being a starter with docker I wasn't quite sure of all the steps being done there. We ended up with this setup, Dockerfile, a shell script and two custom scripts(though in subprocess command of there is check to make sure process_fts_updates work without password but it wasn't working from shell script and second file is just commenting out part of where for registration there are certain checks.)

Build the docker instance:

$ sudo docker build -t zulip-instance .

And finally to get the instance up:

$ sudo docker run --name zulip -i -t -p 9991:9991 zulip-instance

There are still issues like, there should be some bots present initially to create users and realms. And while trying to understand management commands to do the same(creating a bot user) seems to have circular dependency with the step of "notify_created_user". Or maybe I have overlooked something during this step. Also the settings are development, settings for production things would be different.

FOSSEE Project

I got mail a from Prabhu(PR) few days back about extension of FOSSEE project and that they are looking for people for different roles. I had joined the project fairly early(July 2010) when it was getting started and we were trying different ways to show power of Python as alternative to popular proprietary tools used in academia, in terms of writing code, calculations, speed, everything.

We did workshops in different colleges, recorded screen-casts, events, documentations and even a full fledged course: SEES, around Python and various tools which could improve skills students to handle their courses and codes with good practices.

Though this sounds like a lot of documentation work, and not exactly like a "developer profile" but for starters to reach to the level that you can yourself understand these concepts, algorithms, good practices I think is the biggest thing at offer. And this time, as PR mentioned in mail, they are also looking for developers to contribute, develop and push things with some mainstream Open Source projects. Personally, I met wonderful people there, Prabhu, Asokan, punchagan, madhu, vattam, dusual and many more from active and thriving community. There is always good scope and room to grow.

When I had joined the project initially, I was excited about probably contributing, able to sit through classes etc. The mission of project is high minded and this kind of adaptation and acceptance is never overnight and always gradual. I remember being frustrated about lack of enthusiasm from the participants and not seeing results. But that's the thing, you get to take a chance, try things, see if they work, if not, adapt, try more, but try.

Note: I worked with the team very initially and only for a year so I am not aware of how currently things work at fossee, I think contacting them directly for latest update would be best.