After finishing my work with TaxSpanner, I had worked on a personal project, SoFee for around six months. In this post I am documenting idea behind it and what I had expected it to become and where I left it off(till now).
I have realized that in many of my personal projects, I work broadly around archiving the content I am reading and sharing online, be it news articles, or blogs, or tweet threads or videos. This form of data, feels like sand which keeps slipping away as I try hold it, to keep it fresh, accessible and indexed for reference. And it got triggered after the sunset of Google Reader. Punchagan had introduced me to google reader and I had soon started following lot of people there and used its browser extension to archive the content I was reading. In some way, with SoFee, I was trying to recreate Google Reader experience with the people I was following on twitter. And first iteration of the project was just that, it would give an OPML file which could be added to any feed-reader and I will get separate feed of all the people I am following.
Archiving the content of links
Personally Trained Model
Last feature which I had thought of including was a personally trained model which can segregate these links into separate groups or categories. Both Facebook and twitter were messing around with timeline, I didn't want that to happen to mine. I wanted a way to control it myself, in a way which suited me. For first step, I separated my timeline from twitter into tweets which had links and others which were just tweet or updates. Secondly, I listed all these tweets in chronological order. With content extracted using Readability, I experimented with unsupervised learning, KMeans, LDA, visualization of results, to create dynamic groups, but results weren't satisfying to be included as feature. For supervised learning, I was thinking of having a default model based on Reddit categories or wikipedia API which can create a generic simpleton data set and then allow user to reinforce and steer the grouping as their liking. Eventually allow users to have a private, personal model which can be applied to any article, news site or source and it will give them the clustering they want. Again, I failed in putting together with this feature.
What I ended up with and future plans
Initially, I didn't want to get into UI and UX and leave that part on popular and established feed-readers. But it slowed down user onboarding and feedback. I eventually ended up with a small web interface where the links and there content were listed and timeline was getting updated every three hour or so. I stopped working on this project as I started working with Senic, and the project kept working for well above an year. Now its non-functional, but I learned a lot while putting together what was working. It is pretty simple project where we can simply charge user the fee for hosting their content on their designated small vps instance or running a lambda service(to fetch updated timeline, apply their model to cluster data), allow them full control of their data(usage, deletion, updation). I will for sure use my learnings to put together more complete version of project with SoFee2.0, lets see when that happens(2019 resolution?).