Logo Logo
  • About
  • Consortium
  • News and Events
    • News
    • Events
    • Blog
    • Media
  • Resources
  • HPC Observatory
  • Contact

Contact Info

  • Email Info@gmail.com
  • Phone +123 456 7890
  • Office Hours Sat - Wed : 8:00 - 4:00

Additional Links

Connect With Us

Managing Data and Machine Learning Models in HPC Applications

  • Home
  • Blog Details
November 21 2022
November 21, 2022
Blogdata learningHPCmachine learning
  • Blog

The synergy of data science (including big data and machine learning) and HPC yields many benefits for data-intensive applications in terms of more accurate predictive data analysis and better decision making. For instance, in the context of the HPDaSc (High Performance Data Science) project between Inria and Brazil, we have shown the importance of realtime analytics to make critical high-consequence decisions in HPC applications, e.g., preventing useless drilling based on a driller’s realtime data and realtime visualization of simulated data, or the effectiveness of ML to deal with scientific data, e.g., computing Probability Density Functions (PDFs) over simulated seismic data using Spark.

However, to realize the full potential of this synergy, ML models (or models for short) must be built, combined and ensembled, which can be very complex as there can be many models to select from. Furthermore, they should be shared and reused, in particular, in different execution environments such as HPC or Spark clusters.

To address this problem, we proposed Gypscie [Porto 2022, Zorrilla 2022], a new framework that supports the entire ML lifecycle and enables model reuse and import from other frameworks. The approach behind Gypscie is to combine several rich capabilities for model and data management, and model execution, which are typically provided by different tools, in a unique framework. Overall, Gypscie provides: a platform for supporting the complete model life-cycle, from model building to deployment, monitoring and policies enforcement; an environment for casual users to find ready-to-use models that best fit a particular prediction problem, an environment to optimize ML task scheduling and execution; an easy way for developers to benchmark their models against other competitive models and improve them; a central point of access to assess models’ compliance to policies and ethics and obtain and curate observational and predictive data; provenance information and model explainability. Finally, Gypscie interfaces with multiple execution environments to run ML tasks, e.g., an HPC system such as the Santos Dumont supercomputer at LNCC or a Spark cluster. 

Gypscie comes with SAVIME [Silva 2020], a multidimensional array in-memory database system for importing, storing and querying model (tensor) data. The SAVIME open-source system has been developed to support analytical queries over scientific data. Its offers an extremely efficient ingestion procedure, which practically eliminates the waiting time to analyze incoming data. It also supports dense and sparse arrays and non-integer dimension indexing. It offers a functional query language processed by a query optimiser that generates efficient query execution plans.

 

References

[Porto 2022] Fabio Porto, Patrick Valduriez: Data and Machine Learning Model Management with Gypscie. CARLA 2022 – Workshop on HPC and Data Sciences meet Scientific Computing, SCALAC, Sep 2022, Porto Alegre, Brazil. pp.1-2. 

[Zorrilla 2022] Rocío Zorrilla, Eduardo Ogasawara, Patrick Valduriez, Fabio Porto: A Data-Driven Model Selection Approach to Spatio-Temporal Prediction. SBBD 2022 – Brazilian Symposium on Databases, SBBD, Sep 2022, Buzios, Brazil. pp.1-12. 

[Silva 2020] A.C. Silva, H. Lourenço, D. Ramos, F. Porto, P. Valduriez. Savime: An Array DBMS for Simulation Analysis and Prediction. Journal of Information Data Management 11(3), 2020. 

 

By LNCC and Inria 

Return
Blogdata learningHPCmachine learning
Logo

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016478.
DISCLAIMER: The sole responsibility for the content lies with the authors. It does not necessarily reflect the opinion of the European Commission (EC). The EC is not responsible for any use that may be made of the information contained therein.

Contacts

  • risc2-project@bsc.es
  • +34 93 413 77 16
  • Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain

GDPR Compliant:

  • Privacy Policy
  • Cookie Policy

Copyright © 2021. All Rights Reserved. Developed by SUBA.PT

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. However, you may click "Read More" to know everything about how we use cookies.
Read MoreAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT