Getting the act together, personally.

I started working on SocialMedia Feed project from last August. And time after time I had found myself in a slump, looking for motivation, head rush to finish a task. Try to watch movie, find some song-of-day, something, anything, and before I realize, its the next day already. They say, “Show up, show up, show up, and after a while the muse shows up, too” and personally while trying it, self motivation drags, in really bad way.

Last week I was trying to put together a POC for a possible paid work. Task was to create a chat bot around different columns of Excel sheet for users to be able to chat to and get analytical results in conversational manner. So instead of a technical person running a query raised by sales team, user can directly chat with a bot and get answer to a question like "How many tasks finished successfully today", tasks could be something like email campaign or nightly aggregation of data or results from a long running algorithm. Airtifical Intelligence Markup Language - AIML and its concepts are fairly popular to put together such a chat bot. Many of popular platforms like pandora bots, api.ai, wit.ai help create such interface to "train" a bot. Like in case of the above message - How many tasks finished successfully today tasks, successful, today gives us the context on which column to run the query, what value to look for and duration over which we want to get the count.

I didn't have prior experience of using api.ai platform so I had planned to read up some docs and train the system to identify context, maybe a couple of them - ~2hrs of effort. I ended up starting to work on it only past 12 pm, browsing through random doc links, exploring sdk for examples, trying to find my way through setting up the pipeline. By evening 6, I had clocked around 3 hours and 30 minutes on this task, I got context in place, api.ai provided webhook to call third party API call so I used their github demo code and got result from the excel sheet which was shared, though very minimal work but still, I was able to sort out first level of unknowns.

It was frustrating, tiring, but I felt this pressure to finish it. Though the POC worked, I didn't get the work but what is really sad is the way I was able to stick to deadline when pressure of proving myself, convincing someone else was there. For SocialMedia Feed project, I have this task, on top of basic gensim topic words, implement algorithms from Termite and LDAvis paper to identify better topic representation and get this demo in place. I have reference code available(paper work is available on github), I know what I have to do, but still through last 3 days I haven't clocked single minute on this task. It will happen eventually, but I think idea is to get fired up personally, take on things, spend time on them and then mark them done, be professional, not just for others.

Shoutout to punchagan for his inputs on initial draft.

Service worker adventures

With SoFee major work is done in background using celery, polling twitter for latest status, extract the links, fetch their content and eventually the segregation of content would also be done this way. I was looking for a way to keep things updated on user side and concepts of Progressive web app were really appealing.

What does it do?

Browsers(google chrome, firefox et all) are becoming more capable as new web standards are rolling out, like having offline cache, push notifications, accessing hardware(physical web). With these features now HTML based websites can also work as an native app working on your phone(android, iPhone) or desktop.

Stumbling Block #1: Scope and caching

I am using Django and with it all static content(css, JS, fonts) gets served from /static. And for service workers, if we do that, its scope gets limited to /static, that is, it would be able to handle requests getting served under /static. This limits access to API calls I am making. I Looked around and indeed there was a stack-overflow discussion around the same issue. Its a hacky solution and I added on to it by passing on some get PARAMS which I can use in template rendering for caching user specific URLs.

Beyond this I had a few head scratchers while getting cache to work. I struggled quite a bit to short the fetch request and return cached response but it just won't work. I Kept on tweaking the code, experimenting things until I used Jake's trained-to-thrill demo as base to setup things from scratch and then build on top.

Stumbling Block #2: Push Notifications

Service worker provides access to background Push notification. In earlier releases, browsers would register for this service and return a unique Endpoint for subscription, a unique capability URL which is used by server to push notification to. While this endpoint provided by Firefox works out of box, for chromium and google chrome browser, it still returned an obsolete GCM based URL. Now google has started using Firebase SDK and GCM is no longer supported. Beyond this on service side PyFCM library worked just fine to push notifications and it works with firefox too.

Quest to find k for KMeans

With SoFee project, at the moment I am working on feature where system can identify content of link belonging to one of wider category say technology, science, politics etc. I am exploring topic modelling, word2vec, clustering and their possible combination to ship first version of the feature.

Earlier I tried to extract certain number of topics from content of all links. I got some results but through out the mix of links these topics were not coherent. Next I tried to cluster similar articles using KMeans algorithm and then extract topics from these grouped articles. KMeans requires one input parameter from user, k, number of clusters user wants to be returned. In the application flow asking user for such an input won't be intuitive so I tried to make it hands free. I tried two approaches:

  • Run KMeans algorithm with K varying from 2 to 25. I then tried to plot/observe average Silhouette score for all Ks, results I got weren't good that I could use/adopt this method.
  • There are two known methods, gap statistic method and another one potentially superior method which can directly return optimum value of K for a given dataset. I tried to reproduce results from second method but results weren't convincing.

Initially I got decent results: ideal_clusters.png

as I tried more iterations I got some mixed results: not_ideal_results.png

and further I got some crazy results: reverse_results.png

Following is the code I used for these results:

# Loosely adopted from
# https://datasciencelab.wordpress.com/2014/01/21/selection-of-k-in-k-means-clustering-reloaded/
import matplotlib.pylab as plt
import numpy as np
from sklearn.cluster import KMeans
import random

def init_board_gauss(N, k):
    n = float(N)/k
    X = []
    for i in range(k):
	c = (random.uniform(-5, 5), random.uniform(-5, 5))
	s = random.uniform(0.05,0.5)
	x = []
	while len(x) < n:
	    a, b = np.array([np.random.normal(c[0], s), np.random.normal(c[1], s)])
	    # Continue drawing points from the distribution in the range [-1,1]
	    if abs(a) < 5 and abs(b) < 5:
		x.append([a,b])
	X.extend(x)
    X = np.array(X)[:N]
    return X

# This can be played around to confirm performance of f(k)
X = init_board_gauss(1200, 6)
fig = plt.figure(figsize=(18,5))
ax1 = fig.add_subplot(121)
ax1.set_xlim(-5, 5)
ax1.set_ylim(-5, 5)
ax1.plot(X[:,0], X[:, 1], '.', alpha=0.5)
tit1 = 'N=%s' % (str(len(X)))
ax1.set_title(tit1, fontsize=16)
ax2 = fig.add_subplot(122)
ax2.set_ylim(0, 1.25)
sk = 0
fs = []
centers = []
for true_k in range(1, 10):
    km = KMeans(n_clusters=true_k, init='k-means++', max_iter=100, n_init=1,
		verbose=False)
    km.fit(X)
    Nd = len(X[0])
    a = lambda k, Nd: 1 - 3/(4*Nd) if k == 2 else a(k-1, Nd) + (1-a(k-1, Nd))/6
    if true_k == 1:
	fs.append(1)
    elif sk == 0:
	fs.append(1)
    else:
	fs.append(km.inertia_/(a(true_k, Nd)*sk))
    sk = km.inertia_
    centers.append(km.cluster_centers_)

foundfK = np.where(fs == min(fs))[0][0] + 1
ax1.plot(centers[foundfK-1][:,0], centers[foundfK-1][:, 1], 'ro', markersize=10)
ax2.plot(range(1, len(fs)+1), fs, 'ro-', alpha=0.6)
ax2.set_xlabel('Number of clusters K', fontsize=16)
ax2.set_ylabel('f(K)', fontsize=16) 
tit2 = 'f(K) finds %s clusters' % (foundfK)
ax2.set_title(tit2, fontsize=16)
plt.savefig('detK_N%s.png' % (str(len(X))), \
	     bbox_inches='tight', dpi=100)
plt.show()

I had many bugs in my initial version of above code and while trying to crosscheck results I kept fixing them. Eventually I read the paper and I noticed plots which were similar to mine under results section(Figs 6n to o), with further explanation which went:

However, no simple explanation could be given for the cases shown in Figs 6n and o. This highlights the fact that f(K ) should only be used to suggest a guide value for the number of clusters and the final decision as to which value to adopt has to be left at the discretion of the user.

In comments section of DataSciencelab blog someone had mentioned that we won't get this kind of results with real data. With artificial data itself if proposed solution fails I think it can hardly help me to get to appropriate k directly. Maybe what I am seeking itself is not correct, lets see what we are able to put together.

Nayee Taleem

Krunal was visiting Nayee Talim/नई तालीम from dec 12th to 21st and asked me to join for a visit and understand the school model. After the visit to bhasha institute it was one more opportunity to understand how alternate schools are working. The philosophy behind the school has been formulated long ago Gandhi, around 1937 and goes something like: learning by doing and using real world as classroom to understand different subjects. In 2005 the school was reopened in Anand niketan and since then have been trying many things to improve the learning environment and working really hard to keep it relevant in current times.

The school compound itself is really beautiful. campus.jpg

We reached sevagram on 11th and from next day krunal and preeti were facilitating a three day workshop for teachers to explore different ideologies to make the school environment more "children centered". Many models of existing schools from around the world and india were presented, discussed and participants were encouraged to reflect if there was a similar environment in Nayi Taleem and if not what were the reasons behind it, can they be adopted for this school and if yes, how.

In school campus, many methods, activities and tailored curriculum are used to teach science, maths and other subjects. They have farms, art installations, craft and sewing classes. Like in case of farming, each class is allotted a piece of farm and they are responsible for growing, maintaining and selling the produce. produce.jpg

A show and tell session going on, one of the teacher caught a cobra snake/naag in the campus, he showed kids and teachers how it looks, how to identify one and also mentioned how its bite could be lethal, later same teacher released the snake in remote outskirts of village. show_and_tell.jpg

For lunch, classes gets their turn and students along with the teacher prepare lunch for whole school. Basically in krunal's word Gandhi had an idea that kids should be made familiar with the tools and how to use them and that's what the school seemed to be doing.

Despite all this, where they(school-admin, teachers, students) lack is, with current technological developments these tools have changed drastically and many have been rendered obsolete. For computer education they are using standard curriculum which introduces kids with paint, word and other basic computing skills and that too very superficial. Computers, mobiles, "smart" devices are everywhere but without knowing how to use these tools people in locality become mere consumer of things which are getting actively developed somewhere else and might not be meant to used in village's local context and need. I personally think diversity is good, different solutions to problem/challenge brings better insight to the problem and even better solutions.

We were thinking on how to introduce computers with DIY approach so that children can learn these modern tools just like they are learning other tools. While returning krunal mentioned about conducting some engaging and fun session where we explore different themes(games, makers lab kind of setup). I have tried to do this before but one thing I have realized is that the process of making is slow and I don't exactly know how to take sound-bite out of them and conduct sessions around them. Also my knowledge of making itself is limited. I was thinking of doing some gaming session leading to designing a simple game level and then playing it or doing some more engaging sessions using arduino/mobiles. I have exchanged models/references with krunal which can be used(1, 2, 3, 4, 5, 6), I will try to explore more on this and see what can be used or developed for Nayee Taleem.

Personally I am confused and find myself severely incompetent for this particular task. On one side I can totally relate to attempts being made to improve AI, projects like home automation. I was reading one HN thread and there are situations where people are depending on alexa, google home for knowing about weather outside their homes. Somehow I can't relate to that vision where people are so unaware of their surroundings. While on the other hand, kids at Nayee Taleem and Bhasha Institute and many other such place are very aware of the environment, they care for it and nurture it but I am not confident if this narrative will hold on against this widespread and blinded adaption of technology. Only time will tell how all this pans out, but try we must.

Bhasha Institute Tejgarh

Thanks to ruby's relentless efforts(email chain spread over two months) we finally managed dates with Bhasha Institute

What is this Institute?

Ganesh N Devy setup the institute with the idea to provide tribal communities space where their art, culture, learning, knowledge of their environment could be celebrated, curated and preserved. It is located near the village of tejgarh, Vadodra, Gujrat. At the moment the administration and all operations are carried out by local people and for the local people.

What's the need?

Read and watch this piece on PARI about "tribal girls sing English songs in a village that doesn't speak English, in honor of the potato that they don't eat.". It very accurately depicts the broken education system. Founders and supporters of Bhasha recognized this issue from the start and focused on introducing formal education while keeping it relevant to their local context. Enabling locals to take thing forward, finding people who understand the local issues and are motivated to take the charge to find a possible solution.

How are they doing it?

Bhasha has different "verticals" and they organize workshops regularly for all of them to keep evolving and adapting with exchange of skills between locals and visitors. There is tribal museum for curating local arts, library/publications to document, preserve and publish local knowledge, small medical team with both allopathy and homeopathy treatments available for locals, Shaala aka school which works as a support system to get kids ready for mainstream schooling. In shaala they take local students(aged 8 to 12), belonging to mix of tribes speaking different dialects/languages/bhasha. They have multi-lingual teaching system to get students at ease with different dialects/languages and also introduce formal Gujrati in process to enable kids to read and write. Eventually after 2 years with help from institute they are admitted to schools. Apart from language they also get taught local skills related to farming, folk songs, their own culture.

What was I doing there?

Ruby, Praful and sanket had first hand experience with tribal education at school running in Nelgunda by Hemalksha we had lot of questions on how things were managed and core idea behind institute. There were reflections/discussions in terms of what is different between tribals of different regions and how bhasha as institute is trying to stay relevant. As for me it was mostly observing, institute, activities going on in campus, how ruby, sanket and praful were interacting with local kids(they taught kids two lines from a madia folk song). I wasn't able to contribute back to the local community during this stay, but next time for sure I will.

Using sleepTimeout in JavaScript

tldr: ALWAYS take a brief look at official developer documentation of functions.

I was trying to rush a release over weekend and I had requirement where I was to make repeated API calls to track progress of status of task. Without that any new user would be seeing a "dead page" without any info on what is going on and how he/she should proceed. Pseudo code would be something like:

function get_task_status(task_id) {
  $.get("/get_task_status/", {'task_id':task_id})
    .done( function(data) {
      // Update status div
      // Wait for x seconds and repeat this function
    });
}

As usual I hurried to Google search the template/pointer code. StackOverflow didn't disappoint and I landed up with this discussion. It had decent insight on using callback function with setTimeout and I cooked up my own version of it:

function get_task_status(task_id) {
  $.get("/get_task_status/", {'task_id':task_id})
    .done( function(data) {
      if(data['task'] === 'PROGRESS') {
	Materialize.toast(data['info'], 1000);
	setTimeout(get_task_status(task_id), 2000);
      }
    });
}

Looks innocent right? Well that's what got me stumped for almost 3-4 hours. I tried this and my javascript happily ignored setTimeout and delay in seconds and kept making continues GET requests. I tried some variants of above code but nothing worked. Eventually I came across this post on SO, tried the code and it worked! I was convinced that there was some issue with older version handling of setTimeout and 2016 update is what I needed.

Today as I sat to put together a brief note of this experience I was testing setTimeout code on node, browser console, inside HTML template and somehow each time delay was working just fine:

function get_task_status() {
  console.log(Date());
  setTimeout(get_task_status, 2000);
}
> function get_task_status(task_id) {
...   console.log(Date());
...   // Recursive call to this function itself after 2 seconds of delay
...   setTimeout(get_task_status, 2000);
... }
undefined
> get_task_status('something');
Tue Oct 25 2016 15:55:59 GMT+0530 (IST)
undefined
> Tue Oct 25 2016 15:56:01 GMT+0530 (IST)
Tue Oct 25 2016 15:56:03 GMT+0530 (IST)
Tue Oct 25 2016 15:56:05 GMT+0530 (IST)
Tue Oct 25 2016 15:56:07 GMT+0530 (IST)
Tue Oct 25 2016 15:56:09 GMT+0530 (IST)
(To exit, press ^C again or type .exit)

Again, this bummed me, I thought I had "established" that setTimeout was broken and promise is what I should be looking at and get better understanding of. While trying to work it out and understand what is wrong as I checked MDN documentation of the function and I finally realized my real bug. Syntax of function is var timeoutID = window.setTimeout(func[, delay, param1, param2, ...]);

And this is what I was doing: setTimeout(get_task_status(task_id), 2000);

Notice in syntax params are after the delay argument while I just used them directly and this was the small gotcha. I was talking to Syed ji about this experience and he pointed to You don't know series for better understanding of javascript concepts and nuances. I learned my lesson, properly RTFM and as for promise, I will return to learn more about it later, at the moment my code is working.

PEBKAC

Yesterday punchagan introduced me with PEBKAC - Problem Exist Between Keyboard And Chair, it was a Things I learned(TIL). I had experienced this before many times, only I didn't know that there was such a term.

Starting in April, at TaxSpanner I was given task to integrate ITD Webservices APIs for return filing and other features with our existing stack. The procedure included quite a few alien things to me. Sample codes provided by ITD were in Java, they were using something called SPRING framework, our requests had to be routed via specific proxy approved from ITD and furthermore we physically needed an USB DSC key registered with ITD to encrypt the communication.

As I was trying to get first successful run of API working from my system I was particularly stuck with accessing DSC key from my java code. It needed drivers available here and java security(/etc/java-7-openjdk/security/java.security)config file to be edited properly to use correct drivers. After doing these things first thing I tried was to list certificates on the USB token using keytools. And on first run, it worked fine. I was ecstatic, one fewer unknown from the pile of unknowns, right. Wrong, as soon as I tried to run java programs using DSC it threw up lines and lines of error which went something like:

org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace [http://cxf.apache.org/core]
Offending resource: class path resource [ClientConfig.xml]

	at org.springframework.beans.factory.parsing.FailFastProblemReporter.error(FailFastProblemReporter.java:68)
	at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:85)
	at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:80)
	at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.error(BeanDefinitionParserDelegate.java:318)
	at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.parseCustomElement(BeanDefinitionParserDelegate.java:1435)
	at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.parseCustomElement(BeanDefinitionParserDelegate.java:1428)
	... 
	at itd_webs.core.main(Unknown Source)
2016-05-11 12:03:48,114 [main] WARN  org.apache.cxf.bus.spring.SpringBusFactory -  Failed to create application context.
org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace [http://cxf.apache.org/core]
Offending resource: class path resource [ClientConfig.xml]

Getting configs in place so that SPRING framework can load proper credentials from DSC was one another task where inputs from Nandeep proved very crucial. Thankfully we had one more system where this setup with the DSC worked. So this was clear that the particular error related to DSC recognition was just on my system. After lot of head scratching, comparing two systems to identify if something is amiss, trying strace, nothing helped. After scrolling through lot of java related stackoverflow conversations I was playing around with keytools and jdb. As I tried JDB, I noticed DSC blinking and then I thought that the maybe default java was using different configs, I checked my usr/lib/jvm and indeed there were 4 different version of java. I checked `java -version` and it pointed to java version "1.8.065" so instead I tried to compile and run command using /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java and USB blinked happily ever after. While we kept developing and making system stable on computer where things were working, to get it working on my system it almost took two weeks to narrow down to exact issue. And now I have a name for all the time spent, PEBKAC. Thank you punchagan.

Using Hekad to parse logs for relevant parts

We at Taxspanner have been looking at different options for analytics pipeline and setting up things to capture relevant information. Vivek had noticed Hekad and he was wondering if we could use syslogs which are already being generated by the app. Idea was to look for specific log in a format which could contain information like App name, Model, UUID and operation being done.

We followed basic setup guide to get a feel of it and it was processing nginx logs in tunes of millions very quickly. Apart from official documentation, this post talks about how to setup a quick filter around Hekad processing pipeline. We experimented with a client-server setup where client running on app server can tail the django log file, filter relevant log message and push it to server hekad instance aggregating logs from all app servers.

This was the client's side hekad config toml file:

[hekad]
maxprocs = 1
base_dir = "."
share_dir = "hekad-location/share/heka/"

# Input is django log file
[django_logs]
type = "LogstreamerInput"
splitter = "TokenSplitter"
log_directory = "logs/"
file_match = 'django\.log'

# Decoder to parse logs and extracting relevant log
[DjangoLogDecoder]
type = "SandboxFilter"
message_matcher = "Logger == 'django_logs'"
filename = "lua_decoders/django_logs.lua"

# Encoder for output streams
[PayloadEncoder]

# We channel output generated from DjangoLogDecoder to a certain UDP port
[UdpOutput]
message_matcher = "Logger == 'DjangoLogDecoder'"
address = ":34567"
encoder = "PayloadEncoder"

The Lua script to filter relevant log pretty small:

local string = require "string"
local table = require "table"

-- This structure could be used in better way
local msg = {
Timestamp   = nil,
Type        = msg_type,
Payload     = nil,
Fields      = nil
}

function process_message ()
    local log = read_message("Payload")
    if log == nil then
      return 0
    end
    local log_blocks = {}
    for i in string.gmatch(log, "%S+") do
      table.insert(log_blocks, i)
    end
    if table.getn(log_blocks) >= 4 then
      if log_blocks[3] == "CRITICAL" then
	msg.Payload = log
	inject_message(msg)
      end
    end
    return 0
end

With client instance in place now we get out listener config sorted out.

[hekad]
maxprocs = 4
base_dir = "."
share_dir = "hekad-location/share/heka/"

[PayloadEncoder]

# Input listening to port 
[app_logs]
type = "UdpInput"
address = ":34567"

# Output channels message received and just prints them
[LogOutput]
message_matcher = "Logger == 'app_logs'"
encoder = "PayloadEncoder"

And that's it, this will have a basic hekad based pipeline in place which can simply pick information from django logs.

Issues with Indexing while using Cassandra.

We have a single machine cassandra setup on which we are trying different things for analytics. One of Column family we have goes with this discription:

CREATE TABLE playground.event_user_table (
    event_date date,
    event_time timestamp,
    author text,
    content_id text,
    content_model text,
    event_id text,
    event_type text,
    PRIMARY KEY (event_date, event_time)
) WITH CLUSTERING ORDER BY (event_time ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
CREATE INDEX author ON playground.event_user_table (author);

With this table, we populated data in it for different apps/models. Now when we query system with something like:

cqlsh:playground> select * from event_user_table where event_date = '2013-06-02' ;
 event_date | event_time               | author                                 | content_id                                      | content_model | event_id | event_type
------------+--------------------------+----------------------------------------+-------------------------------------------------+---------------+----------+------------
 2013-06-02 | 2013-06-02 00:00:00+0000 |                 anuragabes@example.com |                 anuragabes@example.com##2##2012 |           A   |          |  submitted
 2013-06-02 | 2013-06-02 01:28:13+0000 |               dear_kirnesh@example.com |                                      1000910424 |           B   |          |     closed
 2013-06-02 | 2013-06-02 01:59:31+0000 |                  dsammaiah@example.com |                  dsammaiah@example.com##1##2013 |           A   |          |    created
 2013-06-02 | 2013-06-02 02:00:44+0000 |                     jacobg@example.com |                     jacobg@example.com##5##2013 |           A   |          |    created
 2013-06-02 | 2013-06-02 02:02:16+0000 |                kriti.jneja@example.com |                kriti.jneja@example.com##4##2013 |           A   |          |    created

Result looks good and as expected. Now I query system on the secondary index of author and I get empty or partial results:

cqlsh:playground> select * from event_user_table where author = 'anuragabes@example.com' ;
 event_date | event_time | author | content_id | content_model | event_id | event_type
------------+------------+--------+------------+---------------+----------+------------

(0 rows)

cqlsh:playground> select * from event_user_table where author = 'senthil.ramachandran@example.com' ;
 event_date | event_time               | author                             | content_id | content_model | event_id | event_type
------------+--------------------------+------------------------------------+------------+---------------+----------+------------
 2014-01-18 | 2014-01-18 09:01:52+0000 |   senthil.ramachandran@example.com | 1001068325 |           SRF |          |     closed

(1 rows)

And I have tried this combinations of PRIMARY KEY too ((event_date, event_time), author) but with same results. There are known issues with secondary indexes and scaling1 but it affects single node systems too? I am not sure about it. Time to confirm things.

Update1 <2016-02-10 Wed 15:45>: As mentioned here2, Cassandra has "'lazy' updating to secondary indexes. When you change an indexed value, you need to remove the old value from the index." Could that be the reason?

Extending Zulip to provide Chat-With-Us_Helpdesk interface.

Here is the github repo: https://github.com/baali/chat-with-us

We at TaxSpanner had been using olark to help out our customers over chat interface. While there interface is very mature and gives all the features they mention, we were not able to manage things in terms of how customer was reaching out us via different mediums(email, chat, phone). As zulip got released and we were giving it a try for our internal team communication(both tech+sales team) idea was floated that could we extend the same interface to our customers too?

We set the target to get a prototype in place which could replace olark and if during trails feature requests are reasonable and manageable we would take a call. So idea was to expose a limited view to customers while having full feature interface for our Support Team. We created two stripped down and simple html templates which would be exposed to customers. Got a view in place in lines of home view of zulip with additional logic to route customers to sales team. Notifications for offline and online sales team, additional small checks in main zulip views to make sure basic views are exposed just to internal team.

Like function get_status_dict in zerver/libs/actions.py there is a hook for MIT users,

# Return no status info for regular users
if requesting_user_profile.realm.domain != settings.ADMIN_DOMAIN:
    return defaultdict(dict)

and additional check to home in views:

from zerver.forms import has_valid_realm
if not has_valid_realm(request.user.email):
    from django.contrib.auth import logout
    # making sure user is logged out from session.
    logout(request)
    return redirect('/support')

For adding this interface on landing page we added following HTML

<div id="zulip" style="bottom: 0px; right: 0px; position: fixed; z-index: 9999;" >
  <div>
    <div style="width: 200px; height: 32px;" id="zulip-chat">
      <div>
	<ul>
	  <li>
	    <span class="icon--speech-bubble"></span>
	    Chat with Us!
	  </li>
	</ul>
      </div>
    </div>
    <div>
      <br>
      <iframe id="zulip-iframe" height="350px" width="450px" style="border:1px solid gray; display: none;" src=""></iframe>
    </div>
  </div>
</div>

and this java-script code to handle user clicks

<script>
  $('#zulip-chat').on('click', function(e) {
  if ($('#zulip-iframe').is(':hidden')) {
    $('#zulip-iframe').attr('src', 'https://url-to-zulip-instance');
  }
  $('#zulip-iframe').toggle("slow");
  });
</script>

At the moment we auto-login customer after asking their email-id. For security purpose, we create a new private stream for the customer and unsubscribe it form previous existing streams. Ideally there should be a way(OAuth or server-to-server authentication) to make sure user is logged in on main site and then enable previous history of chats they had done.

I am not exactly sure about this (ab)Use-case of Zulip and quite possibly there could be something which I am overlooking/missing. But zulip has a really strong chat interface around which if we can integrate our APIs, it will give us a lot of control. Adding bots, notification, some simple intelligence and developing things on top of it could enable and extend the existing web application in lot of ways.