Lightning talks session
Adrián García Riber, Albert Pujol Torras, Alvaro Duran Barata, Diego Quintana, Edgar Sarria, Miquel Sàrrias, Monica Dominguez, Rajdeep Pal, Ricardo Ander-Egg Aguilar, Sergi Baena, Thais Ruiz de Alda
Sala d'actes Ada Lovelace
Language: English / Spanish Topic: Multiple topicsHow Data Scientists Can Be More Productive with the Power of Static Typing - Alvaro Duran
Data scientists frequently find themselves in situations where they need to verify that some intermediate step in a data pipeline works as expected, and tools such as Jupiter notebooks, where each step is explicitly shown to the user, are very useful at that. However, it can become quite cumbersome and time-consuming as the data pipeline grows more complex.
That's where static typing comes in. By using types in our code, we can introduce our data into a structured pipeline and catch errors early on, before they have the chance to cause any real problems. This is particularly useful when working with large datasets, where a small error can have a significant impact on our results.
But what exactly do we mean by 'types'? Essentially, types are a way of specifying the kind of data that a variable or function will handle. For example, we might specify that a certain variable should only contain integers, or that a certain function should only accept strings.
By using types, we can catch errors early on in the development process, rather than having to wait until runtime to discover them. This can save us a lot of time and effort in the long run, as we can catch errors before they have a chance to cause any real damage.
Now, you might be thinking, 'but doesn't introducing types make our code more complex?' And the answer is, not necessarily. In fact, Python has built-in support for types that makes it easy to incorporate them into our code.
Of course, that is just the first part of the puzzle. Good type systems are analogous to test suites: they require some effort, but the payoff is immense.
This talk will cover the very basics of how data scientists can benefit from using types, not only on their production code, but even on crude prototypes, giving some peace of mind by making illegal states unrepresentable.
Astronomical data Sonification - Adrián García Riber
The current development of massive astronomical archives and virtual observatory technology (VO) offers a wide range of data products and services that can be explored with a personal computer through interoperable technology. The use of sonification and musification in multimodal displays for the exploration of astronomical data offers an additional domain (complementary to visualization), that allows researchers to get immersed in their case studies, navigating the massive downloads of big data generated by space telescopes, and that makes stellar catalogs and databases more accessible for blind-visually impaired (BVI) users. In this talk I would like to introduce the collection of Python and TensorFlow Sonification prototypes developed in my PhD to generate multimodal representations of astronomical data using Deep Learning and Neural Networks.
Python models and alerts within dbt with dbt-fal and novu - Diego Quintana
How to trigger alarms and run python models within dbt with dbt-fal and novu.
dbt is the T in the ELT pipe. Although dbt allows for python model since version 1.3, its support is limited. dbt-fal enables running python models and python code in general within a model. A simple demo will be shown, with a forecasting model is trained as part of a dbt pipeline and an alarm is triggered with novu, an open source notification infrastructure.
Serverless Machine Learning: Streamlining Deployment and Scaling of ML Services - Rajdeep Pal
The topic of deploying machine learning services using a serverless architecture is focused on leveraging cloud computing services to simplify the deployment and scaling of machine learning models. Using a serverless approach, developers can focus on writing code for their ML models without worrying about infrastructure setup and management. The serverless model also provides automatic scaling of resources to meet the demands of incoming requests, reducing the need for manual intervention and ensuring that services remain highly available. This talk would cover the benefits of using a serverless approach to deploy machine learning services and provide practical guidance on how to implement it effectively. It would also explore the challenges and trade-offs involved in using serverless architecture for ML services, such as performance considerations, security, and cost optimisation.
Practical considerations when using chatGPT in production for multilanguage processing - Albert Pujol Torras
We will describe the results of empirical measurements of the behavior of LLMs to automate NLP processes depending on both the language in which the prompt is written, the lenguage of the text to be analyzed as well as some practical considerations to take into account when considering putting in production this type of Models.
International Data Spaces: Implementing the European Strategy on Data - Miquel Sarrias
The International Data Spaces (IDS) initiative aims at cross-sectoral data sovereignty and data interoperability. It sets forth a Reference Architecture Model basing on open standards and contributing to global standards. By specifying data usage constraints, it defines the terms and conditions for the data economy. Data spaces are key to the global, digital economy. The European Commission is defining Europe’s path forward into the digital economy of the future. A core element of their vision: international data spaces grounded in European values of trust and the self-determination of data usage by data providers, that we call data sovereignty.
Forecasting in the integral water cycle - Sergi Baena-Miret
Droughts can have severe impacts on water availability, demand, and quality, affecting water allocation, infrastructure planning, and environmental protection. Accurate forecasting of droughts is essential for mitigating their impacts by enabling effective water management strategies. Various techniques such as statistical models, machine learning, and remote sensing are used for drought forecasting in the integral water cycle. However, challenges such as data scarcity, non-stationarity, and uncertainty in climate change projections pose significant challenges for accurate forecasting.
MusicMetaData & algorithmic diversity perrspective - Thais Ruiz de Alda
We want to talk about the current challenge de music industry is facing around metadata and the challenges that including a gender perspective can imply. The talk will cover the story of music metadata and how Digitalfems is challenging the musical descriptive metadata status quo to develop an ADM system that will include gender diversity in a beta testing/lab environment with the support of different institutions such as FECYT, UPC and other relevant stakeholders
Free online Data Apps with Shimoku Library - Edgar Sarria
We will show how to use Shimoku instead of Streamlit to build Data Products in few lines of code
Data Science in Transport Modelling - Monica Dominguez
Transport modelling and traffic simulation have traditionally been branches of Civil Engineering and Applied Mathematics. In this talk, we will present how the field is embracing Data Science and Machine Learning methodologies for data processing, pattern analysis, forecasting and prediction of recurrent and non-recurrent events.
Bundling a Python app in a single file - Ricardo Ander-Egg
Live coding talk. This talk will show how to bundle a Python app and all of its dependencies in a single file. Then we will copy this file to a different machine and see how the script runs without having to deal with Docker or virtual environments.