fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

8.6K
active users

#textmining

0 posts0 participants0 posts today

📯 Diese Woche im #DigitalHistoryOFK: Torsten Hiltmann und @DigHisNoah präsentieren "RAG den Spiegel" – ein innovatives RAG-System zur Analyse des SPIEGEL-Archivs. Der Vortrag zeigt, wie #LLMs Geschichtswissenschaft verändern und hermeneutische mit computationellen Methoden verbinden.
📅 25. Juni, 16-18 Uhr, online (Zugang auf Anfrage)
ℹ️ Abstract: dhistory.hypotheses.org/10912 #TextMining #4memory #DigitalHistory @historikerinnen @histodons @digitalhumanities

Folks working in the #DigitalHumanities or #TextMining and related research fields, a technical question: do you use a database management system (DBMS) to store your data? Or do you use good old JSON or CSV files on local drives? If the first, what do you use (Postgres, MySQL, Mongo)? If the second, how do you sync your files to enable collaboration on the same data?

I'm starting a new project, and from past experience I think it would be best to set up a managed DB from the beginning, instead of using JSON files. That way my team has access to the same data and we can query the specific data we need for some analysis.

Open Access book edited by Silke Schwandt: Digital Methods in the Humanities.
Explore interdisciplinary challenges, case studies, and innovative perspectives on digital tools in textual research.
Includes: From Serial Sources to Modeled Data, OCR, text mining & more.
transcript-verlag.de/978-3-837
#DigitalHumanities #OpenAccess #DigitalMethods #TextMining #HumanitiesResearch #SilkeSchwandt #transcriptVerlag

transcript VerlagDigital Methods in the HumanitiesVolume 1 of »Digital Humanities Research« offers a unique perspective on digital methods for and in the humanities.

Code4Lib: Distant Listening: Using Python and Apps Scripts to Text Mine and Tag Oral History Collections. “Designed for oral history project managers, the workflow empowers student workers to generate, modify, and expand subject tags during transcription editing, thereby enhancing the overall accuracy and discoverability of the collection. The paper details the workflow, surveys challenges […]

https://rbfirehose.com/2025/04/15/distant-listening-using-python-and-apps-scripts-to-text-mine-and-tag-oral-history-collections-code4lib/

Resulting from an @snsf_ch SPARK grant this took some time to mature, but the outcome is very imformative and builds a foundation for where to head next - how to liberate facts/information locked in the published literature #textmining #biodiversity preprints.arphahub.com/article

ARPHA PreprintsFrom literature to biodiversity data: mining arthropod organismal and ecological traits with machine learningThe fields of taxonomy and biodiversity research have witnessed an exponential growth in published literature. This vast corpus of articles holds information on the diverse biological traits of organisms and their ecologies. However, access to and extraction of relevant data from this extensive resource remain challenging. Advances in text and data mining (TDM) and Natural Language Processing (NLP) techniques offer new opportunities for liberating such information from the literature. Testing and using such approaches to annotate articles in machine actionable formats is therefore necessary to enable the exploitation of existing knowledge in new biology, ecology, and evolution research. Here we explore the potential of these methods to annotate and extract organismal and ecological trait data for the most diverse animal group on Earth, the arthropods. The article processing workflow uses manually curated trait dictionaries with trained NLP models to perform labelling of entities and relationships of thousands of articles. A subset of manually annotated documents facilitated the formal evaluation of the performance of the workflow in terms of entity recognition and normalisation, and relationship extraction, highlighting several important technical challenges. The results are made available to the scientific community through an interactive web tool and queryable resource, the ArTraDB Arthropod Trait Database. These methodological explorations provide a framework that could be extended beyond the arthropods, where TDM and NLP approaches applied to the taxonomy and biodiversity literature will greatly facilitate data synthesis studies and literature reviews, the identification of knowledge gaps and biases, as well as the data-informed investigation of ecological and evolutionary trends and patterns.