fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

10K
active users

#textanalysis

1 post1 participant0 posts today

Useful contribution to discussions in this area, for sure! The results highlight "whether an automated approach that would still require micromanaging and adjusting several variables by the human researcher would, in fact, be more efficient an approach compared to the same tasks performed manually by human labour"

Out of Context! Managing the Limitations of Context Windows in #ChatGPT-4o Text Analyses doi.org/10.46298/jdmdh.15090 #DigitalHumanities #TextAnalysis #LLM #ArtificialIntelligence #GLAMR

EpisciencesOut of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text AnalysesIn recent years, large language model (LLM) applications have surged in popularity, and academia has followed suit. Researchers frequently seek to automate text annotation - often a tedious task – and, to some extent, text analysis. Notably, popular LLMs such as ChatGPT have been studied as both research assistants and analysis tools, revealing several concerns regarding transparency and the nature of AI-generated content. This study assesses ChatGPT’s usability and reliability for text analysis – specifically keyword extraction and topic classification – within an “out-of-the-box” zero-shot or few-shot context, emphasizing how the size of the context window and varied text types influence the resulting analyses. Our findings indicate that text type and the order in which texts are presented both significantly affect ChatGPT’s analysis. At the same time, context-building tends to be less problematic when analyzing similar texts. However, lengthy texts and documents pose serious challenges: once the context window is exceeded, “hallucinated” results often emerge. While some of these issues stem from the core functioning of LLMs, some can be mitigated through transparent research planning.

Mastering these core NLP techniques is crucial for any data scientist dealing with text data. From tokenization to language modeling, each method serves a unique purpose in processing, analyzing, and extracting valuable insights from textual information.

#NLP #DataScience #Tokenization #LanguageModeling #TextAnalysis #TextMining #MachineLearning

read more: blogulr.com/khushnuma7861/topn

Like we found in “Your Health vs. My Liberty” (doi.org/10.1016/j.cognition.20) Yael Rozenblum et al. found that compliance with #publicHealth guidance correlated with indicators of the perceived threat of a viral pandemic.

Also, relying on #misinformation correlated with reliance on simple (vs. complex) #reasoning.

The free paper: doi.org/10.1002/tea.21975

Have you ever wanted to use a #LLM as one step in a workflow?

We integrated #GPT into the open-source analysis platform #useGalaxy, where you can link GPT to several thousand other tools, add more attachments for analysis and make your research reproducible.

galaxyproject.org/news/2024-09

In our example, we uploaded an audio file and used #Whisper to convert it into text, cut out the moderation, and prompted chatGPT to translate it into German.

#DH #textanalysis #tools
@galaxyfreiburg

galaxyproject.orgUsing Large Language Models in complex workflowsUse ChatGPT in your analysis on the Galaxy Server to leverage the Large Language Model in your automated workflows

📚🇮🇹 New working paper: "Evaluating Embedding Models for Clustering Italian Political News"

This study compares embedding models for unsupervised clustering of Italian political news shared on Facebook before the 2018 and 2022 elections, aiming to advance NLP methods for political text analysis in non-English languages.

Paper: osf.io/preprints/osf/2j9ed

Code & data: github.com/fabiogiglietto/Sema

Feedback welcome!

osf.ioOSF

Just launched: pycpidr 🎉
github.com/jrrobison1/pycpidr

Python library to determine the propositional idea density of an English text automatically.

Idea density is a measure of the amount of information conveyed relative to the number of words used. This metric has applications in various fields, including linguistics, cognitive science, and healthcare research.
#Python #Linguistics #psychometrics #NLP #TextAnalysis #OpenSource

GitHubGitHub - jrrobison1/pycpidr: Python library to determine the propositional idea density of an English text automatically.Python library to determine the propositional idea density of an English text automatically. - jrrobison1/pycpidr

😓 An NLP-Based System for Detecting Depression Levels through User Comments on Twitter (X)

mdpi.com/2227-7390/12/13/1926

MDPIMental-Health: An NLP-Based System for Detecting Depression Levels through User Comments on Twitter (X)The early detection of depression in a person is of great help to medical specialists since it allows for better treatment of the condition. Social networks are a promising data source for identifying individuals who are at risk for this mental disease, facilitating timely intervention and thereby improving public health. In this frame of reference, we propose an NLP-based system called Mental-Health for detecting users’ depression levels through comments on X. Mental-Health is supported by a model comprising four stages: data extraction, preprocessing, emotion detection, and depression diagnosis. Using a natural language processing tool, the system correlates emotions detected in users’ posts on X with the symptoms of depression and provides specialists with the depression levels of the patients. By using Mental-Health, we described a case study involving real patients, and the evaluation process was carried out by comparing the results obtained using Mental-Health with those obtained through the application of the PHQ-9 questionnaire. The system identifies moderately severe and moderate depression levels with good precision and recall, allowing us to infer the model’s good performance and confirm that it is a promising option for mental health support.
Continued thread

@mtaylor_soc @rstats @sociology @academicchatter

More updates for #mappingtexts!

We're now sharing all the code from our book _Mapping Texts_: gitlab.com/culturalcartography

Each folder corresponds to a chapter, and each script corresponds (roughly) to a section.

We run diagnostics on the scripts periodically to catch any issues and update them accordingly. Updates are documented here: textmapping.com/fixes/

GitLabCultural Cartography / MappingTexts / text2map.bookcode · GitLabGitLab.com

📣 Attention Linguistics & Digital Humanities students! 🎓📚
Join @janispagel and me for the »Prompting, Evaluation, Interpretation: An Introduction to LLMs in Text Analysis« course at the upcoming Deep Learning for Language Analysis Summer School in Cologne: ml-school.uni-koeln.de! 📝🔍
🗓️ Don't miss out – registration is open until June 16th! 🙌
#LLMs #TextAnalysis #NLP #AI #Linguistics #DigitalHumanities #CRETA

ml-school.uni-koeln.dehttp://ml-school.uni-koeln.de | CA3