fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

9.9K
active users

#hdf5

2 posts2 participants0 posts today

#HDF5 jest super. W skrócie:

1. Oryginalnie, projekt używał systemu budowania autotools. Instalował binarkę h5cc, która — obok bycia nakładką na kompilator — miała dodatkowe opcje do uzyskiwania informacji o instalacji HDF5.
2. Później dodano alternatywny system budowania #CMake. W ramach tego systemu budowania instalowana jest uproszczona binarka h5cc, bez tych dodatkowych funkcji.
3. Każdy, kto próbował budować przez CMake, szybko odkrywał, że ta nowa binarka psuje większość paczek używających HDF5, więc wracano do autotools i zgłoszono problem do HDF5.
4. Autorzy zamknęli zgłoszenie, stwierdzając (tłum. moje): "Zmiany w h5cc przy użyciu CMake zostały udokumentowane w Release.txt, kiedy ich dokonano - kopia archiwalna powinna być dostępna w plikach z historią."
5. Autorzy ogłosili zamiar usunięcia wsparcia autotools.

Co stawia nas w następującej sytuacji:

1. Praktycznie wszyscy (przynajmniej #Arch, #Conda-forge, #Debian, #Fedora, #Gentoo) używa autotools, bo budowanie przy pomocy CMake psuje zbyt wiele.
2. Oryginalnie uznano to za problem w HDF5, więc nie zgłaszano problemu innym paczkom. Podejrzewam, że wiele dystrybucji nawet nie wie, że HDF5 odrzuciło zgłoszenie.
3. Paczki nadal są "zepsute", i zgaduję, że ich autorzy nawet nie wiedzą o problemie, bo — cóż, jak wspominałem — praktycznie wszystkie dystrybucje nadal używają autotools, a przy testowaniu budowania CMake nikt nie zgłaszał problemów do innych paczek.
4. Nawet nie mam pewności, czy ten problem da się "dobrze" naprawić. Nie znam tej paczki, ale wygląda to, jakby funkcjonalność usunięto bez alternatywy, i tym samym ludzie mogą co najwyżej samemu zacząć używać CMake (wzdych) — tym samym oczywiście psując swoje paczki na wszystkich dystrybucjach, które budują HDF5 przez autotools, o ile nie dodadzą dodatkowo kodu dla wsparcia tego drugiego wariantu.
5. Wszystko wskazuje na to, że HDF5 jest biblioteką, której autorów nie obchodzą ich własni użytkownicy.

github.com/HDFGroup/hdf5/issue

When building hdf5 with autotools, the following file is used to produce h5cc and friends: https://github.com/HDFGroup/hdf5/blob/develop/bin/h5cc.in However, when building with cmake, the following...
GitHubh5cc is severely lacking when building hdf5 with cmake, breaking downstream users · Issue #1814 · HDFGroup/hdf5By BtbN

#HDF5 is doing great. So basically:

1. Originally, upstream used autotools. The build system installed a h5cc wrapper which — besides being a compiler wrapper — had a few config-tool style options.
2. Then, upstream added #CMake build system as an alternative. It installed a different h5cc wrapper that did not have the config-tool style options anymore.
3. Downstreams that tried CMake quickly discovered that the new wrapper broke a lot of packages, so they reverted to autotools and reported a bug.
4. Upstream closed the bug, handwaving it as "CMake h5cc changes have been noted in the Release.txt at the time of change - archived copy should exist in the history files."
5. Upstream announced the plans to remove autotools support.

So, to summarize the current situation:

1. Pretty much everyone (at least #Arch, #Conda-forge, #Debian, #Fedora, #Gentoo) is building using autotools, because CMake builds cause too much breakage.
2. Downstreams originally judged this to be a HDF5 issue, so they didn't report bugs to affected packages. Not sure if they're even aware that HDF5 upstream rejected the report.
3. All packages remain "broken", and I'm guessing their authors may not even be aware of the problem, because, well, as I pointed out, everyone is still using autotools, and nobody reported the issues during initial CMake testing.
4. I'm not even sure if there is a good "fix" here. I honestly don't know the package, but it really sounds like the config-tool was removed with no replacement, so the only way forward might be for people to switch over to CMake (sigh) — which would of course break the packages almost everywhere, unless people also add fallbacks for compatibility with autotools builds.
5. The upstream's attitude suggests that HDF5 is pretty much a project unto itself, and doesn't care about its actual users.

github.com/HDFGroup/hdf5/issue

When building hdf5 with autotools, the following file is used to produce h5cc and friends: https://github.com/HDFGroup/hdf5/blob/develop/bin/h5cc.in However, when building with cmake, the following...
GitHubh5cc is severely lacking when building hdf5 with cmake, breaking downstream users · Issue #1814 · HDFGroup/hdf5By BtbN

Three Ways of Storing and Accessing Lots of Images in #Python
realpython.com/storing-images-

Using plain files, #LMDB, and #HDF5. It's too bad there's an explicit serialization step for the LMDB case. In C we'd just splat the memory in and out of the DB as-is, with no ser/deser overhead.

Also they use two separate tables for image and metadata in HDF5, but only one table in LMDB (with metadata concat'd to image). I don't see why they didn't just use two tables there as well.

realpython.comThree Ways of Storing and Accessing Lots of Images in Python – Real PythonIn this tutorial, you'll cover three ways of storing and accessing lots of images in Python. You'll also see experimental evidence for the performance benefits and drawbacks of each one.

New software descriptor published on ing.grid! "h5RDMtoolbox - A Python Toolbox for FAIR Data Management around HDF5" by Matthias Probst and Balazs Pritz inggrid.org/article/id/4028/ #RDM #HDF5 #python #FAIR #kit

ing.gridh5RDMtoolbox - A Python Toolbox for FAIR Data Management around HDF5Sustainable data management is fundamental to efficient and successful scientific research. The FAIR principles (Findable, Accessible, Interoperable and Reusable) have been proven to be successful guidelines to enable comprehensible analysis, discovery and re-use. Although the topic has recently gained increasing awareness in both academia and industry, the engineering sciences in particular are lagging behind in managing the valuable asset of data. While large collaborations and research facilities have already implemented metadata strategies, smaller research groups and institutes are often missing a common strategy due to heterogeneous and rapidly changing environments as well as missing capacity or expertise. This paper presents an open-source package, called h5RDMtoolbox, written in Python helping to quickly implement and maintain FAIR research data management along the entire data lifecycle using HDF5 as the core file format. One of the key features of the toolbox is the flexible, high-level implementation of metadata standards, adaptable to the changing requirements of projects, collaborations and environments, such as experimental or computational setups. Implementation of existing schemas such as EngMeta or the cf-conventions are possible and intended use-cases. Other benefits of the toolbox include a simplified interface to the data and database solutions to query metadata stored in HDF5 files.
Continued thread

NeXus ist ein in unseren ommunities weitverbreitetes #Datenformat. Es bietet einen Standard, welche Parameter gespeichert und wie sie im #HDF5-Datei strukturiert werden sollen, um #Metadaten zusammen mit den Daten in einer hochstrukturierten Weise zu integrieren.

Darauf aufbauend gibt es eine maschinenlesbare #Ontologie; diese definiert eindeutige Bezeichner und schafft ein kontrolliertes Vokabular für die Namen aller experimentellen Parameter und gemessenen Variablen im Experiment.

2/3

Gestern haben Rolf und Heike beim #NFDI-Netzwerktreffen Berlin-Brandenburg unser Daphne-Konsortium vorgestellt. Speziell sind sie auf die Terminologien und Datenformate eingegangen, welche für #Synchrotron- und #Neutronen-Experimente genutzt werden.

1/3

Die Folien gibt's bei Zonodo:
zenodo.org/records/12728050

ZenodoTerminologies in Photon and Neutron SciencesSlides for a presentation at the Second NFDI Berlin-Brandenburg network meeting on Ontologies and Knowledge Graphs, 11 July 2024

STOP DOING HDF5

- Years of HDF5 yet no real-world use-case found
- Files were never meant to be hierarchical or "self explaining"
- "Hello I would like a file system within a file please" ...statements dreamt up by the utterly deranged
- Look at what the HDF5 group has been demanding our respect for all this time
- wanted to have an explanation for the stuff in a folder? We had a tool for that: it was called a Readme

They have played us for absolute fools

New preprint available: h5RDMtoolbox - A Python Toolbox for FAIR Data Management around HDF5.

"This paper presents an open-source package, called h5RDMtoolbox, written in Python helping to quickly implement and maintain FAIR research data management along the entire data lifecycle using HDF5 as the core file format"

preprints.inggrid.org/reposito

preprints.inggrid.orgh5RDMtoolbox - A Python Toolbox for FAIR Data Management around HDF5

This week, I'm at @DESY for the #HDF User Group #HUG summit on plugins and data compression.

We will start with latest updates on plugins:
“HDF5 and plugins – overview and roadmap“, Dana Robinson.
Then, Elena Pourmal will hopefully have good news: “Expanding HDF5 capabilities to support multi-threading access and new types of storage“.

Currently, reading big datasets can be quite a burden due to some … challenges of #HDF5 with multi-threading.

I'm at ESTEC (ESA's European Space Research and Technology Centre) in the Netherlands this week for my first #PLATO science working team meeting as community scientist. One interesting discussion has been on astronomical file formats (honestly)... Future NASA missions look like they might move from #fits to #ASDF, while other astronomical surveys have started using #HDF5. But it's tough to try to predict exactly where the community will be in 5-10 years time. Input welcome!