DictionWave

About

DictionWave is an NLP-based word-discovery tool built with Python, NumPy, Flask, and fastText word embeddings.

The number boxes above allow you to fine-tune the output a little. Output Size determines how many words will be returned from each query. Rarity Boost allows you to discover less frequently used words: the higher the value, the more the rarity of a word will be prioritised. A nonzero Randomness value shuffles the words, and introduces a chance that less similar words will be shown. A larger Randomness value increases the likelihood and the extent to which output words will be unrelated to the input word.

All word data is derived from Common Crawl web data (600B tokens), and the words are organised using cosine similarities of word vectors created by the fastText library. Specifically, fastText's crawl-300d-2M.vec resource was used, which contains two million words. This data was cleaned to remove duplicates, and information deemed to be irrelevant or excessively offensive. For more technical details, please see Implementation Details on the DictionWave GitHub repository.

Thank you for reading, and I hope you enjoy using DictionWave! Click here to return to the homepage.