About 795 results
Open links in new tab
  1. Word frequency list based on a 15 billion character corpus: BCC (BLCU ...

    Jun 15, 2018 · The Beijing Language and Culture University created a balanced corpus of 15 billion characters. It’s based on news (人民日报 1946-2018,人民日报海外版 2000-2018), literature (books …

  2. Bigrams sorted by frequency with pinyin & English?

    Jun 21, 2023 · The Beijing Language and Culture University created a balanced corpus of 15 billion characters. It’s based on news (人民日报 1946-2018,人民日报海外版 2000-2018), literature (books …

  3. Common Idioms; A Collection by Grade [HSK / old HSK / 中考 / 高考 / ...]

    Dec 27, 2019 · The corpus is much larger than the CCL (470 million characters), the CNC (100 million characters), the SUBTLEX-CH (47 million characters) and the LCMC (less than 2 million characters). …

  4. Media-related vocabulary gathering project - Pleco Software Forums

    Jan 15, 2020 · With a small corpus of 650 articles from People's Daily, downloaded using a Python script, I hope to start providing a more modern frequency list of media-related vocabulary. The …

  5. Wrong Cantonese Jyutping [lei5 --> incorrect] for 裡 [leoi5 --> correct ...

    Apr 4, 2025 · PyCantonese comes with one built-in corpus, the Hong Kong Cantonese Corpus. For corpora other than HKCanCor, PyCantonese provides the function read_chat () to read in Cantonese …

  6. Word frequency list based on a 15 billion character corpus: BCC (BLCU ...

    Jun 15, 2018 · I would read in the BCC corpus frequency list as a dictionary, then Having concatenated all the news/magazine articles as plain text, I would build a dictionary of all the words in the …

  7. Integrating BCC Corpus Data into Dictionary - Pleco Software Forums

    Jan 3, 2019 · The BCC corpus seems to have pretty loose licensing terms. Pleco already seems to be using frequency data to sort the search results. Adding them meaningfully to dictionary definitions …

  8. Flashcards for TOCFL (2023), CCCC, TBCL - Pleco Software Forums

    Nov 7, 2023 · I've parsed out vocabulary from these taiwanese tests and converted to flashcards in pleco's format. Useful e.g. for seeing term levels, intended part of speech and sometimes …

  9. Sentences flashcards generator (Python script) - Pleco Software Forums

    Dec 16, 2021 · The Beijing Language and Culture University created a balanced corpus of 15 billion characters. It’s based on news (人民日报 1946-2018,人民日报海外版 2000-2018), literature (books …

  10. www.plecoforums.com

    most_common_n_number_of_corpus_words = 40000 # Limit selection of corpus words to the # the first n most common words from the corpus (all from BCC corpus)