Manipulation de la #GED avec Apache Airflow
jeudi 15 mai à 12:00
Hôtel de la Métropole, Lyon
Présentée par @jeremielesage
https://jeci.fr/fr/evenements/rpll-atelier-2025/
#LogicielLibre #floss #ApacheAirflow

Manipulation de la #GED avec Apache Airflow
jeudi 15 mai à 12:00
Hôtel de la Métropole, Lyon
Présentée par @jeremielesage
https://jeci.fr/fr/evenements/rpll-atelier-2025/
#LogicielLibre #floss #ApacheAirflow
Ah yes, because nothing says "simple" like melding Apache Airflow and LLM workflows with a side of AI Agents. Just what every developer dreams of: wrestling with Pydantic AI while pretending to control the automation beast.
https://github.com/astronomer/airflow-ai-sdk #ApacheAirflow #LLMworkflows #AIAgents #PydanticAutomation #DeveloperDreams #HackerNews #ngated
LLM Workflows then Agents: Getting Started with Apache Airflow
Big data can be difficult data.
Your pipeline takes several hours to process terabytes of data... and then something goes wrong. You have to start over.
Incremental loading can help!
Conquer your big data by breaking it down in your Airflow DAGs. You'll reclaim your time, lower your cloud bill, and maybe even lower your cholesterol:
Read the blog: https://kpdata.dev/blog/airflow-incremental-loading/
C'est l'heure du cocktail de bienvenue proposé par Satya !
- une dose de #ModernDataStack,
- un trait de géo,
- un zeste d'Open Source,
- et beaucoup d'amour .
Cette recette vous est servie dans cet article qui détaille comment le Gard valorise ses géo-données.
https://geotribu.fr/articles/2025/2025-02-25_stack_data_gard/
Relecture : @geojulien & Michaël Galien
#PostgreSQL #PostGIS #GDAL #OGR #DBT #Metabase #ApacheAirflow
Stuck on when to run that pipeline again? I've been there too many times!
Scheduling data pipelines can be a complex puzzle--time-based, frequency-based, event-driven... there are so many options. Let's unravel the mystery together!
Discover the methods for scheduling Airflow DAGs and make your data engineering life simpler.
Read the blog: https://kpdata.dev/blog/airflow-scheduling/
Just caught up with the recent Delta Lake webinar,
> Revolutionizing Delta Lake workflows on AWS Lambda with Polars, DuckDB, Daft & Rust
Some interesting hints there regarding lightweight processing of big-ish data. Easy to relate to any other framework instead of Lambda, e.g. #ApacheAirflow tasks
Azure Data Factory And Apache Airflow Integration Flaws Let Attackers Gain Write Access https://cybersecuritynews.com/azure-airflow-security-flaw/ #InformationSecurityNews #SecurityVulnerabilities #CyberSecurityNews #VulnerabilityNews #AzureDataFactory #ApacheAirflow #vulnerability #cloud
If you're using #ApacheAirflow, I'm interested in hearing your use cases for it.
Apache Airflow: Orchestrierung komplexer Workflows leicht gemacht
Wenn du eine Lösung suchst, um komplexe Datenpipelines zu verwalten, ist Apache Airflow eine starke Wahl! In unserem neuen Artikel zeigt unser Entwickler, wie Airflow funktioniert, welche Vorteile es bietet und wie es in Bereichen wie maschinellem Lernen eingesetzt wird. Mit Codebeispielen zeigt er, wie du eigene Workflows effizient aufsetzen kannst.
Hast du schon mit Apache Airflow gearbeitet? Welche Erfahrungen hast du gemacht? Lass uns darüber sprechen!
https://www.elinext.de/blog/einsatz-von-apache-airflow-mit-maschinellem-lernen/
"How to build a data extraction pipeline with Apache Airflow"
Déborah Mesquita
Still, #dagster has less dependencies, and after some battling and downgrading version, I managed to start the dev environment...
Building a #Apacheairflow container, is pure chaos. Installing via pip, is another kind of hell, with broken builds all over the place (google-re2)... A tool that has 9 years in the market, being so overwhelming its installation process
At least I can make #dagster run.
I guess we are all ill served with workflow orchestration tools anyway, on the open source world
I take back what I said yesterday... #dagster installation process is quite clumsy. And I detected some issues in terms of documentation/instructions.
I manage to install and run #Apacheairflow in 2 clicks...
Going to #PyConUS for the first time. Looking forward to actually meet the community folks. Just in time to gear up for development streamlining discussions for the upcoming
#apacheAirflow 3.0.
Just wanted to take a moment to give a shout out to the @airflow team. I have been using Airflow just under two years and the amount of improvements I have seen during my relatively short tenure are phenomenal. Thanks for all your hard work, and everyone who contributes! ^_^
Tenable Research discovered a one-click account takeover vulnerability in the AWS Managed Workflows Apache Airflow service, and that could have resulted in remote code execution (RCE) on the underlying instance, and in lateral movement to other services. Additional research revealed that numerous shared-parent service domains in AWS, Azure and GCP were misconfigured, putting cloud customers at considerable risk. No CVE ID associated. https://www.tenable.com/blog/flowfixation-aws-apache-airflow-service-takeover-vulnerability-and-why-neglecting-guardrails
FlowFixation: AWS Apache Airflow Service Takeover Vulnerability
Date: March 21, 2024
CVE: Not specified
Sources: Tenable Blog
Issue Summary
Tenable Research discovered a vulnerability, named FlowFixation, in AWS Managed Workflows for Apache Airflow (MWAA) that could allow session hijacking leading to a full takeover of the victim's web management panel.
Technical Key findings
FlowFixation combines session fixation and XSS via Amazon AWS domain misconfiguration, enabling attackers to authenticate known sessions and gain control over victim's Apache Airflow management panels.
Vulnerable products
Impact assessment
Potential for remote code execution on underlying instances and lateral movement to other services.
Patches or workaround
AWS has addressed the vulnerability. Users should ensure they are using updated services.
Tags
Quoi de plus satisfaisant qu'un #workflow #data tout dans #ApacheAirflow !?
Just published my first blog post on Apache Airflow: a zero-to-one introduction to this tool. It details how to get it up and running using Docker
, guides you through creating and scheduling simple pipelines
, and shows how to keep an eye on them with the Airflow UI.
https://www.franciscoyira.com/post/data-pipelines-cloud-intro-airflow-docker/
#ApacheAirflow #DataScience