https://github.com/amitness/toolbox
Curated list of libraries for a faster machine learning workflow
toolbox
Curated libraries for a faster workflow
Phase: Data
Data Annotation
Datasets
Importing Data
Data Augmentation
Phase: Exploration
Data Preparation
Notebook Exploration
- View Jupyter notebooks through CLI: nbdime
- Parametrize notebooks: papermill
- Access notebooks programatically: nbformat
- Convert notebooks to other formats: nbconvert
- Extra utilities not present in frameworks: mlxtend
- Maps in notebooks: ipyleaflet
Phase: Feature Engineering
Feature Generation
Phase: Modeling
Model Selection
NLP
- Preprocessing: textacy
- Text Extraction from Image, Audio, PDF: textract
- Text generation: gp2client, textgenrnn, gpt-2-simple
- Text summarization: textrank, pytldr
- Spelling correction: JamSpell, pyhunspell, pyspellchecker, cython_hunspell, hunspell-dictionaries, autocorrect (can add more languages)
- Keyword extraction: rake, pke
- Multiply Choice Question Answering: mcQA
- Sequence to sequence models: headliner
- Transfer learning: finetune
- Translation: googletrans
- Embeddings: pymagnitude (manage vector embeddings easily), chakin (download pre-trained word vectors), sentence-transformers, InferSent, bert-as-service, sent2vec
- Multilingual support: polyglot, inltk (indic languages), indic_nlp
- NLU: snips-nlu
- Semantic parsing: quepy
- Inflections: inflect
- Contractions: pycontractions
- Coreference Resolution: neuralcoref
- Readability: homer
- Language Detection: language-check
- Topic Modeling: guidedlda, enstop
- Clustering: spherecluster (kmeans with cosine distance), kneed (automatically find number of clusters from elbow curve), kmodes
- Metrics: seqeval (NER, POS tagging)
- String match: jellyfish (perform string and phonetic comparison),flashtext (superfast extract and replace keywords), pythonverbalexpressions: (verbally describe regex), commonregex (readymade regex for email/phone etc)
- Sentiment: vaderSentiment (rule based)
- Text distances: textdistance, editdistance
- PID removal: scrubadub
- Profanity detection: profanity-check
- wordclouds: stylecloud
Speech Recognition
RecSys
- Factorization machines (FM), and field-aware factorization machines (FFM): xlearn
- Scikit-learn like API: surprise
Computer Vision
Timeseries
Framework extensions
Phase: Monitoring
Model Training Monitoring
Phase: Optimization
Hyperparameter Optimization
Interpretability
Visualization
Phase: Production
Model Serialization
Scalability
Bechmark
API
Dashboard
Adversarial testing
Python libraries