Follow

New to Mastodon, check out my open-source project! Open source offline neural machine translation in Python: github.com/argosopentech/argos

@argosopentech Nice to see an open piece of translation software. Whatever is in my Linux distro was impossible to get running some time ago.

Is this "just" a coding/machine learning project for you or are you specially interested in translations? I am starting a project trying to stimulate the production of real translations of scientific articles by making them more discoverable.

@GrassrootsReview Hi thanks! The easiest way to install it is probably with Snapcraft, if you have issues make a GitHub issue and I can help. For interest I think a little of both, my background is in software engineering not translation, but this project and machine translation/nlp is something I have a long term interest in.

@argosopentech Thanks.

In the city next door, Cologne, is a company called Deepl and their translations are astonishingly good for German to English and the other way around. For shorter written texts, say one A4, it happens that I do not have to correct anything afterwards and when I do it is mostly a matter of taste. It is really close to magic.

@argosopentech
Woah, I hope this can become the alternative to google translate I am looking for to degoogle my personal life.

@argosopentech
I'm pretty far from programming but if it's offline and neural... Sounds cool

@argosopentech Hi I wonder if you would be interested in integrating your neural network translation project with some morphological analysis software based on formal grammar models. I am building a language learning app (github.com/cdurden/aleksi, hosted at aleksi.org) and I think this combined approach would be effective.

@chrisdurden Hi what would this involve? It's open source so you're certainly welcome to use it and I'm generally responsive to GitHub issues if you run into trouble. There's also an API available at libretranslate.com

@argosopentech Does your software use any morphological tagging of words? Would it be straightforward to just run tagging software on the words and feed that extra data to the neural network?

@chrisdurden I don't know what morphological means. The software works by mapping one sequence of tokens in the source language into a translated sequence of tokens based on training data.

@argosopentech I don't think morphology is essential to the question I mean to ask.

Are tokens the same as words? I am going to assume so... Suppose for each word, you look up (in a dictionary for example) what part(s) of speech the word can be. This part of speech (noun, adjective, etc.) is an example of a tag. It provides some information about how the word can be used in a sentence. My question is this: Can you feed such a tag into your software along with each word?

@chrisdurden I use SentencePiece for tokenization which makes "sub-word" tokens. It's a little complicated but tokens are not necessarily full words. I use Stanza to detect sentence boundaries but I think it also includes functionality to identify words and their parts of speech so this would probably be the easiest way to do what you're trying to do.

@chrisdurden You potentially could insert parts of speech as tokens, which are then removed in the output, to try to improve translation quality. I would guess this would not help though, Google has a paper on their translation system and they don't do this: arxiv.org/abs/1609.08144 . The neural net doing the translation is large and is likely developing some understanding of parts of speech internally, maybe even better than a dictionary lookup could provide.

@chrisdurden Additionally, a major benefit of the current aprproach is that it is agnostic to the languages and just needs data.

@argosopentech Thanks for your response! Since I am interested in language learning, I am not thinking of using parts of speech to make machine translation more accurate. I am actually interested in building a software that can take sentences and output grammatical constructs. Examples in English of grammatical constructs might be "prepositional phrase," or "possessive gerund" which can be recognized by specific tokens appearing in concert with specific parts of speech.

@chrisdurden I see, good luck! I would look at the Stanza Python library, I think it may have some functionality of interest to be you

Sign in to participate in the conversation
Fosstodon

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.