About me

TL;DR

My name is Aleksandar, but people generally call me just Sasho. I live in London where I work as an NLP Scientist for Babylon Health. My interests lie mostly in natural language processing, machine learning, information extraction, text mining, and health informatics. When I’m not working, I enjoy cycling, video games, climbing and squash.

The long story

I was born in the ancient city of Plovdiv to a family of accountants. I showed early interest in mathematics and computers, which I pursued in high school. At 19 I moved to Germany where I studied Computational Linguistics for six years, earning a BA and MA degrees from the University of Tübingen. On the NLP side, my training focused on classical parsing formalism, basic linguistic theory, finite state automata, statistics and what you would call now machine learning for NLP. On the technical side, my degrees included data structures and algorithms, as well as several programming courses in Perl and Java. I particularly enjoyed learning about machine learning, FSAs and phonetics. The interest in the latter two was my motivation in developing this transcription generation program for Bulgarian. It is hard to talk about Tübingen without mentioning the beauty of the town itself and the serene nature of the German South surrounding it.

After graduating, I moved to Sofia where I took a contract job at the Bulgarian Academy of Sciences, developing NLP technology for Bulgarian-English machine translation. My work there focused on developing machine learning models for part-of-speech tagging (which is particularly challenging in Bulgarian) and dependency parsing. Most of this work made it into a language processing pipeline that was used in a wider machine translation project under the Seventh Research Framework Programme.

In 2012 I left Sofia to pursue a PhD in Natural Language Processing at the University of Sussex in Brighton. My supervisor Prof. John Carroll and I focused on using machine learning to extract structured information from the free text part of primary care records. You can read all about it in my thesis, but the gist of it is that we added manual chunk and named entities annotations to some 900 primary care records, and trained machine learning models that automatically detected symptoms, drugs and diseases. Towards the end of my PhD, I started working for eRevalue — a business intelligence startup aiming to disrupt corporate sustainability reporting. As a Data Scientists there, I focused on analysing online news and annual corporate reports, which included data ingestion, information extraction, analysis, and presentation. I was part of a team that designed and tested an innovative ontology on sustainability reporting featuring 100 topics and over 6000 terms.

After a year and a half there I joined Babylon where I am currently an NLP Scientist working on Word Sense Disambiguation, Entity/Graph Embeddings, Information Extraction and Text Summarisation.