Diskuse k Wikislovníku:Korpusy
Přidat témaVzhled
Poslední komentář: před 1 rokem od uživatele Dan Polansky v tématu „Zda musí být korpus anotovaný“
Zda je Google Books korpus
[editovat]- https://www.google.com/search?q=%22google+books%22+corpus
- https://www.english-corpora.org/googlebooks/
- https://books.google.com/ngrams/
- Choose corpus
- American English
- etc.
- Choose corpus
- https://varieng.helsinki.fi/CoRD/corpora/GoogleBooks/
- Google Books Corpora
- 'Although this "corpus" is based on Google Books data, it is not an official product of Google or Google Books (citation). Rather it was created by Mark Davies, Professor of Linguistics at Brigham Young University, and it is related to other large corpora that we have created.'
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4596490/
- Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
- "However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not."
- https://eadh.org/news/2011/05/13/google-books-corpus
- Google Books corpus
- "The Brigham Young University (in Provo, Utah) is pleased to announce a new corpus -- the Google Books (American English) corpus: http://googlebooks.byu.edu/."
Z toho se zdá, že se slovo "corpus" užívá v širším i užším slova smyslu. --Dan Polansky (diskuse) 18. 5. 2023, 12:52 (CEST)
Zda jsou korpusy ručně anotované
[editovat]- https://wiki.korpus.cz/doku.php/pojmy:anotace
- "Proces, při němž se ručně či automaticky připojují interpretační lingvistické, strukturní údaje a/nebo metatextové údaje k textovým datům korpusu." Italika/kurzíva ode mne.
--Dan Polansky (diskuse) 18. 5. 2023, 12:18 (CEST)
Co je korpus
[editovat]- https://wiki.korpus.cz/doku.php/pojmy:korpus
- "Jazykový korpus (z lat. corpus „tělo, těleso“) je rozsáhlý soubor autentických textů (psaných nebo mluvených) převedený do elektronické podoby v jednotném formátu tak, aby v něm bylo možné jednoduše vyhledávat jazykové jevy, zejména slova a slovní spojení (kolokace)."
--Dan Polansky (diskuse) 20. 5. 2023, 08:07 (CEST)
Další odkazy odpovídající na otázku:
- https://guides.library.ucla.edu/c.php?g=180293&p=1189870
- "Linguistic Corpora: A collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language (corpus linguistics)."
- https://www.english-linguistics.uni-mainz.de/corpus-linguistics/
- "Corpus linguistics is a methodology that involves computer-based empirical analyses (both quantitative and qualitative) of language use by employing large, electronically available collections of naturally occurring spoken and written texts, so-called corpora."
- https://www.press.umich.edu/pdf/9780472033850-part1.pdf
- "So what exactly is corpus linguistics? Corpus linguistics approaches the study of language in use through corpora (singular: corpus). A corpus is a large, principled collection of naturally occurring examples of language stored electronically."
- https://www1.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/introduction2.html
- "Computer-readable corpora can consist of raw text only, i.e. plain text with no additional information. Many corpora have been provided with some kind of linguistic information, here called mark-up or annotation."
--Dan Polansky (diskuse) 20. 5. 2023, 08:18 (CEST)
Zda musí být korpus anotovaný
[editovat]- https://www1.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/introduction2.html
- "Computer-readable corpora can consist of raw text only, i.e. plain text with no additional information. Many corpora have been provided with some kind of linguistic information, here called mark-up or annotation."
Korpusy mohou být a) neanotované, b) ručně anotované, a c) automaticky/strojově anotované. --Dan Polansky (diskuse) 20. 5. 2023, 09:29 (CEST)