So this is not a run of the mill conference for a DBA but in my evolving role then it is good to get out there and experience a conference that is a little different from the normal data conferences that I go to. Qcon is primarily aimed at software developers and has been running for about 10 years and takes place all over the world now. This first day I took exclusively the “Stream Processing @ Scale” track apart from the keynotes as this is the focus of the project at work at the moment.
QCon London Day 1: Keynote – UNEVENLY DISTRIBUTED by Adrian Colyer
Adrian reads a paper a day, summerises it and then publishes it on “the morning paper” no mean feat and I admire him for the dedication this must take! This was an interesting keynote that raised the virtues of reading a paper a day because:
- They are great thinking tools – They get you to think about what people are doing and what you could do or try
- They raise your expectations – Make your solutions better, or what you think you should be getting as a solution
- They give you real life lessons you can learn from – Read about what people have implemented, or given to the community to implement for themselves
- They are a great conversation – Can see how ideas progress through time, who has built what on top of what
- They are unevenly distributed – Across subjects
Basically read more papers you will know more stuff and you will be more awesome in your job, bringing more to the table for your employer and yourself.
QCon London Day 1: Talks
PATTERNS OF RELIABLE IN-STREAM PROCESSING @ SCALE – Alexey Kharlamov
This was a rather short talk but interesting and started a theme for the day, which has left me with a question that as of yet has not been fully answered. Alexey went through all the different patterns that the company that he works for have gone through to process data in streams. They had tried LAMBDA and KAPPA and were working towards something else now but that was not eluded too.
- Need event time as well as event capture time for proper windowing. This reinforces what we see at my worplace and validates everything else that is out there
STREAM PROCESSING WITH APACHE FLINK – Robert Metzger
New product in a way that will get its full release tomorrow (08/03/2016). It is promising to completely subsume batch by allowing windowing over “large” timescales by utilising in memory and disk persisted aggregations as well as a host of other interesting features that other systems do not offer.
- Google Dataflow is being made into an Apache incubator project called Apache Beam
MICROSERVICES FOR A STREAMING WORLD – Ben Stopford
This is a brave new world and its a world where things that you (I) would traditionally use databases for a job (lookup values) you can now use variations of the open source streaming projects. This talk looked at an addon for Apache Kafka called KStreams that allow you to persist the latest version of a key so that it could be use by a micro service in combination with a stream to create other services. We also need to embrace decentralisation.
- KStreams can be used to make KTables that can be joined with data from a stream to enable querying for a micro service
- Kafka has compacted tables that allow you to store the latest value for a key if you so wish!
STREAMING AUTO-SCALING IN GOOGLE CLOUD DATAFLOW – Manuel Fahndrich
This talk seemed like quite a long explanation of the planning that goes into the “auto scaler” to make its decision on to scale or not. Interesting seeing some of the formulas, but in practice as this is removed from the user through the developer console then this was for informational purposes only. Manuel also went through some of the challenges that they had not solved yet e.g. Quanta. Quanta is where as you downsize the number of machines (and the virtual disks that sit behind them) you end up with an uneven distribution and therefore other machines can only process at the rate of the machine with the least disks.
- Google are still going to make money from you even if you enable auto scaling, just maybe not as much!
DATA STREAMING OPEN SPACE
This was not very well subscribed with there being only me, the facilitator and three other participants. I was hoping for some more people to be there to learn about what people are doing have tried and warnings of what to avoid! As it turns out I was the 2nd most experienced with streaming there, which at our infancy of usage is slightly worrying about what the rest of the world is doing.
- Streaming is new and not many people are sharing, if they are doing it.
REALTIME STREAM COMPUTING &ANALYTICS @UBER – Sudhir Tonse
Good to see what a disruptive tech company is doing and seeing that they are building tools because they can’t find any to support their needs.
QCon London Day 1: Evening Keynote – BLT: BABBAGE LOVELACE TURING (SO WHO DID INVENT THAT COMPUTER?) John Graham-Cumming & Sydney Padua
Found this a little long and not quite the content I was hoping for; thinking drunk histories. Was interesting and well put together, but pondered some points too much. The talk took you through the Uber tech stack for producing data, processing data, storage of the data, querying and consumption.
- Ubers world is hexagons
- There are loads of tools out there; that come out all the time; use the one that best suits your needs at the time you need it. Change only when there is a better one, not just because it is two weeks later
Questions From Today
- Why do people use Apache X Y and Z and manage all of that themselves rather than using an “autoscaling solution” such a GCS?
- Why if there are so many people that are using Apache X Y and Z are there not more people talking about it in production apart from large “disruptive” organisations such as UBer?
- Why if we have the ability to output this data to so many different (heterogeneous) stores are there very few (any) tools that pull it all together again?