Kyubyong / nlp_tasks
- воскресенье, 15 октября 2017 г. в 03:15:07
Natural Language Processing Tasks and References
I've been working on several natural language processing tasks for a long time. One day, I felt like to draw a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.
I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!
Oct. 13, 2017.
by Kyubyong
PAPER Automatic Text Scoring Using Neural NetworksPAPER A Neural Approach to Automated Essay ScoringCHALLENGE Kaggle: The Hewlett Foundation: Automated Essay ScoringPROJECT EASE (Enhanced AI Scoring Engine)WIKI Speech recognitionPAPER Deep Speech 2: End-to-End Speech Recognition in English and MandarinPAPER WaveNet: A Generative Model for Raw AudioPROJECT A TensorFlow implementation of Baidu's DeepSpeech architecturePROJECT Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNetCHALLENGE The 5th CHiME Speech Separation and Recognition ChallengeDATA The 5th CHiME Speech Separation and Recognition ChallengeDATA CSTR VCTK CorpusDATA LibriSpeech ASR corpusDATA Switchboard-1 Telephone Speech CorpusDATA TED-LIUM CorpusWIKI Automatic summarizationBOOK Automatic Text SummarizationPAPER Text Summarization Using Neural NetworksPAPER Ranking with Recursive Neural Networks and Its Application to Multi-Document SummarizationDATA Text Analytics Conferences (TAC)DATA Document Understanding Conferences (DUC)INFO Coreference ResolutionPAPER Deep Reinforcement Learning for Mention-Ranking Coreference ModelsPAPER Improving Coreference Resolution by Learning Entity-Level Distributed RepresentationsCHALLENGE CoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotesCHALLENGE CoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotesPAPER Neural Network Translation Models for Grammatical Error CorrectionCHALLENGE CoNLL-2013 Shared Task: Grammatical Error CorrectionCHALLENGE CoNLL-2014 Shared Task: Grammatical Error CorrectionDATA NUS Non-commercial research/trial corpus licenseDATA Lang-8 Learner CorporaDATA Cornell Movie--Dialogs CorpusPROJECT Deep Text CorrectorPRODUCT deep grammarPAPER Grapheme-to-Phoneme Models for (Almost) Any LanguagePAPER Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation LearningPAPER Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionPROJECT Sequence-to-Sequence G2P toolkitDATA Multilingual Pronunciation DataWIKI Language identificationPAPER AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKSCHALLENGE 2015 Language Recognition EvaluationWIKI Language modelTOOLKIT KenLM Language Model ToolkitPAPER Distributed Representations of Words and Phrases and their CompositionalityPAPER Character-Aware Neural Language ModelsDATA Penn TreebankWIKI LemmatisationPAPER Joint Lemmatization and Morphological Tagging with LEMMINGTOOLKIT WordNet LemmatizerDATA Treebank-3WIKI Lip readingPAPER Lip Reading Sentences in the WildPAPER 3D Convolutional Neural Networks for Cross Audio-Visual Matching RecognitionPROJECT Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural NetworksDATA The GRID audiovisual sentence corpusPAPER Neural Machine Translation by Jointly Learning to Align and TranslatePAPER Neural Machine Translation in Linear TimePAPER Attention Is All You NeedCHALLENGE ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATIONCHALLENGE EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17) DATA OpenSubtitles2016DATA WIT3: Web Inventory of Transcribed and Translated TalksDATA The QCRI Educational Domain (QED) CorpusWIKI InflectionPAPER Morphological Inflection Generation Using Character Sequence to Sequence LearningCHALLENGE SIGMORPHON 2016 Shared Task: Morphological ReinflectionDATA sigmorphon2016WIKI Named-entity recognitionPAPER Neural Architectures for Named Entity RecognitionPROJECT OSU Twitter NLP ToolsCHALLENGE Named Entity Recognition in TwitterCHALLENGE CoNLL 2002 Language-Independent Named Entity RecognitionCHALLENGE Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity RecognitionDATA CoNLL-2002 NER corpusDATA CoNLL-2003 NER corpusDATA NUT Named Entity Recognition in Twitter Shared taskPAPER Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase DetectionPROJECT Paralex: Paraphrase-Driven Learning for Open Question AnsweringDATA Microsoft Research Paraphrase CorpusDATA Microsoft Research Video Description CorpusDATA Pascal DatasetDATA Flickr DatasetDATA The SICK data setDATA PPDB: The Paraphrase DatabaseDATA WikiAnswers Paraphrase CorpusPAPER Neural Paraphrase Generation with Stacked Residual LSTM NetworksPAPER A Deep Generative Framework for Paraphrase GenerationPAPER Paraphrasing Revisited with Neural Machine TranslationWIKI ParsingTOOLKIT The Stanford Parser: A statistical parserTOOLKIT spaCy parserPAPER A fast and accurate dependency parser using neural networksCHALLENGE CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal DependenciesCHALLENGE CoNLL 2016 Shared Task: Multilingual Shallow Discourse ParsingCHALLENGE CoNLL 2015 Shared Task: Shallow Discourse Parsing CHALLENGE SemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!WIKI Part-of-speech taggingPAPER Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary LossPAPER Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov ModelsDATA Treebank-3TOOLKIT nltk.tag packageWIKI Pinyin input methodPAPER Neural Network Language Model for Chinese Pinyin Input Method EnginePROJECT Neural Chinese TransliteratorWIKI Question answeringPAPER Ask Me Anything: Dynamic Memory Networks for Natural Language ProcessingPAPER Dynamic Memory Networks for Visual and Textual Question AnsweringCHALLENGE TREC Question Answering TaskCHALLENGE NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)CHALLENGE CLEF Question Answering TrackCHALLENGE SemEval-2017 Task 3: Community Question AnsweringDATA MS MARCO: Microsoft MAchine Reading COmprehension DatasetDATA Maluuba NewsQADATA SQuAD: 100,000+ Questions for Machine Comprehension of TextDATA GraphQuestions: A Characteristic-rich Question Answering DatasetDATA Story Cloze Test and ROCStories CorporaDATA Microsoft Research WikiQA CorpusDATA DeepMind Q&A Dataset DATA QASentWIKI Relationship extractionPAPER A deep learning approach for relationship extraction from interaction context in social manufacturing paradigmWIKI Semantic role labelingBOOK Semantic Role LabelingPAPER End-to-end Learning of Semantic Role Labeling Using Recurrent Neural NetworksPAPER Neural Semantic Role Labeling with Dependency Path EmbeddingsPAPER Deep Semantic Role Labeling: What Works and What's NextCHALLENGE CoNLL-2005 Shared Task: Semantic Role LabelingCHALLENGE CoNLL-2004 Shared Task: Semantic Role LabelingTOOLKIT Illinois Semantic Role Labeler (SRL)DATA CoNLL-2005 Shared Task: Semantic Role LabelingWIKI Sentence boundary disambiguationPAPER A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical DomainTOOLKIT NLTK TokenizersDATA The British National CorpusDATA Switchboard-1 Telephone Speech CorpusWIKI Sentiment analysisINFO Awesome Sentiment AnalysisCHALLENGE Kaggle: UMICH SI650 - Sentiment ClassificationCHALLENGE SemEval-2017 Task 4: Sentiment Analysis in TwitterCHALLENGE SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and NewsPROJECT SenticNet DATA Multi-Domain Sentiment Dataset (version 2.0)DATA Stanford Sentiment TreebankDATA Twitter Sentiment CorpusDATA Twitter Sentiment Analysis Training CorpusDATA AFINN: List of English words rated for valenceWIKI Source separationPAPER From Blind to Guided Audio Source SeparationPAPER Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationCHALLENGE Signal Separation Evaluation Campaign (SiSEC)CHALLENGE CHiME Speech Separation and Recognition ChallengeWIKI Speaker diarisationPAPER DNN-based speaker clustering for speaker diarisationPAPER Unsupervised Methods for Speaker Diarization: An Integrated and Iterative ApproachPAPER Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian FusionCHALLENGE Rich Transcription Evaluation WIKI Speaker recognitionPAPER A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORKPAPER DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATIONCHALLENGE NIST Speaker Recognition Evaluation (SRE)INFO Are there any suggestions for free databases for speaker recognition?WIKI Speech_segmentationPAPER Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than StatisticsPAPER Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word EmbeddingsPAPER Unsupervised Lexicon Discovery from Acoustic InputPAPER Weakly supervised spoken term discovery using cross-lingual side informationDATA CALLHOME Spanish SpeechWIKI Speech synthesisPAPER WaveNet: A Generative Model for Raw AudioPAPER Tacotron: Towards End-to-End Speech SynthesisPAPER Deep Voice 2: Multi-Speaker Neural Text-to-SpeechDATA The World English BibleDATA LJ Speech DatasetDATA Lessac DataCHALLENGE Blizzard Challenge 2017PRODUCT LyrebirdPROJECT The Festvox projectTOOLKIT Merlin: The Neural Network (NN) based Speech Synthesis SystemWIKI Speech enhancementBOOK Speech enhancement: theory and practicePAPER An Experimental Study on Speech Enhancement BasedonDeepNeuralNetworkPAPER A Regression Approach to Speech Enhancement BasedonDeepNeuralNetworksPAPER Speech Enhancement Based on Deep Denoising AutoencoderWIKI StemmingPAPER A BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMING TOOLKIT NLTK StemmersWIKI Terminology extractionPAPER Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act DetectionWIKI Text simplificationPAPER Aligning Sentences from Standard Wikipedia to Simple WikipediaPAPER Problems in Current Text Simplification Research: New Data Can HelpDATA Newsela DataWIKI Textual entailmentPROJECT Textual Entailment with TensorFlowPAPER Textual Entailment with Structured Attentions and CompositionCHALLENGE SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailmentCHALLENGE SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment ChallengeWIKI TransliterationINFO Transliteration of Non-Latin scriptsPAPER A Deep Learning Approach to Machine TransliterationCHALLENGE NEWS 2016 Shared Task on Transliteration of Named EntitiesPROJECT Neural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?PAPER PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAININGPROJECT An implementation of voice conversion system utilizing phonetic posteriorgramsCHALLENGE Voice Conversion Challenge 2016CHALLENGE Voice Conversion Challenge 2018DATA CMU_ARCTIC speech synthesis databasesDATA TIMIT Acoustic-Phonetic Continuous Speech CorpusWIKI Word embeddingTOOLKIT Gensim: word2vecTOOLKIT fastTextTOOLKIT GloVe: Global Vectors for Word RepresentationINFO Where to get a pretrained modelPROJECT Pre-trained word vectors of 30+ languagesPROJECT Polyglot: Distributed word representations for multilingual NLPINFO What is Word Prediction?PAPER The prediction of character based on recurrent neural network language modelPAPER An Embedded Deep Learning based Word PredictionPAPER Evaluating Word Prediction: Framing Keystroke SavingsDATA An Embedded Deep Learning based Word PredictionPROJECT Word Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?WIKI Word segmentationPAPER Neural Word Segmentation Learning for ChinesePROJECT Convolutional neural network for Chinese word segmentationTOOLKIT Stanford Word SegmenterTOOLKIT NLTK Tokenizers