3rd CEPR-Imperial-Plato Market Innovator (MI3) Conference 2019 – BMLL Data – Feature Generation at Scale



3rd CEPR-Imperial-Plato Market Innovator (MI3) Conference 2019

BMLL Data – Feature Generation at Scale

Elliot Banks (BMLL)


Elliott Banks is a Lead Data Scientist at BMLL technologies. He studied for his PhD at Imperial College, focusing on gravitational physics. The physics focus of his PhD led to a broader interest in machine learning, data science and artificial intelligence that he now applies to his role at BMLL.


Main Takeaways and Quotes


Banks’ presentation provided an overview of BMLL’s overall approach, practices and general client impact surrounding the actionable insights that their data model enables. It begins by tracing four of the key trends that have impacted market structure for their clients:

  • Regulation
  • Technology
  • Electronification and
  • Market fragmentation


He goes on to discuss the role of digital collaboration – open source has enabled significantly increased adoption over the last 15 years, with early aggregation of data taking place on Microsoft Excel, processing only 300 aggregates. This has increased complete analysis of the orderbook as of 2018, with cloud hosted, scalable solutions integrating machine learning dominating the world of data.


“We’re getting more and more data, so it becomes more and more difficult to actually be able to process and get a handle on that data […] we’ve got a greater set of tools so that we can actually process and use that data”


This new algo-driven paradigm, born from the four trends outlined above, has resulted in data science and machine learning being placed at the core of every market participant’s practice. Raw trade data is normalised, analysed, input into machine learning functionality and produces research.


“You’ll need some sort of way to ingest that data, clean it, normalise it, store it and look for some sort of features you can actually use to extract value from it.”


BMLL’s objective is to enable clients to interact with millions of datapoints in an intuitive, insightful way. Their dashboard system provides interactive graphs and analytics that track data in a highly visual, ultimately actionable way.

“A feature is anything we can use to understand and make a prediction that will help our model more forward […] Features are hard to generate – it’s the thing that you need knowledge and understanding of what you are trying to do in order to generate and come up with things that may give you insight and give you an understanding of the market.”


This poses a number of challenges; the data itself may need to be sourced, processed or output differently in order to provide the optimal market insights. Furthermore, the “FTSE 100 alone produces over 15TB of data a year – It’s not a small amount of data.” BMLL’s solution for the data engineering pipelines leverage AWS (Amazon Web Services and Spark) to allow their clients to free up their time and focus on problem solving.


“These technologies have meant that time spent as a researcher is time spent thinking about the problem and the solutions you want to generate instead of spending time having doing the data engineering part. That is time consuming and expensive […]”


Elliott concludes his keynote, explaining that the upshot of BMLL’s offering is that “More time as a researcher can be spent doing research.”



Images: Muhammad Ashraf ©2019