Python June Meetup

written by Israel Saeta Pérez on 2018-06-21

Talks

How does that PySpark thing work? And why Arrow makes it faster?

by Rubén Berenguel (@berenguel) (English, 45 min)

Back in ye olde days of Spark, using Python with Spark was an exercise in patience. Data was moving up and down from Python to Scala, being serialised constantly. Leveraging SparkSQL and avoiding UDFs made things better, as well as the constant improvement of the optimisers (Catalyst and Tungsten). But, with Spark 2.3 PySpark has speed up tremendously thanks to the (still experimental) addition of the Arrow serialisers.

In this talk we will learn how PySpark has improved its performance in Apache Spark 2.3 by using Apache Arrow. To do this, we will travel through the internals of Spark to find how Python interacts with the Scala core, and some of the internals of Pandas to see how data moves from Python to Scala via Arrow.

Slides available at: https://github.com/rberenguel/pyspark-arrow-pandas

Frozen Python: satice ice dataloggers

by Oriol Sánchez (English, 35 min)

Project website: https://headingnorthweb.wordpress.com/tag/en-us/

In this talk we will be about Sea Ice buoys and embedded scientific observation platforms in extreme environments.

We will learn how to use python to write drivers, custom data protocols, sensors and GPS integration. Also we will take a look on data visualisation and other painful engineering stories to bring data from field to your server.

More