5 Open problems in NLP

This log post on NLP, builds from a request that a Reddit user had and I have tried my best to justify to address the most pressing bottlenecks that the NLP community faces.

5. Evaluation Metrics :

This is not an explicit bottleneck problem as such in the community, however, there are many researchers from the community who want to shed light on the often, neglected aspects such as blindly following certain architectures, dataset and metrics for evaluation.

As the way is being paved for much higher level cognitive tasks understanding why certain techniques and architectures work well in certain scenarios, definitely help.

Another concern was for the evaluation metrics, the fact that how well do these techniques fit and generalise to the true variability offered by human languages and better yet, how to come up with much more interesting natural language inference datasets.


" I'd be much happier with benchmarks that understated progress than the current ones that seem to overstate it. " - George Dahl

Here are some pointers to build up on this here and here.

4. Life long learning :

The other hindrance faced in the community is to solve the problems of

  • Life long adaptation of low level models for downstream tasks

  • Applying transfer learning

  • Seamlessly integrating language related modalities such as vision, text, audio.

  • Effective cross-domain transfer for low resource scenarios

" Ability to transfer models to new conditions, which includes learning under limited (or absence of) annotated resources. To be able to learn truly robust NLP models, for many more than the tiny amount of languages and domains which we can currently support " - Barbara Plank

You can check out Sebastian's blog to build up on this and more specifically his new paper.

3. Goal oriented dialogue systems :

A recent spike in the number of papers addressing goal oriented dialogue systems in ACL, EMNLP is evident in the ACL anthology.

Which leads to the next open problem, which is to figure out how to have longer goal/task oriented human-machine conversations that require real-world context and a knowledge base.

Task driven dialogue systems with state tracking, dialogue systems using Reinforcement learning and other bunch of novel techniques are a part of current active research.

Here is a rich, exhaustive slide to combine both Reinforcement learning (RL) along with NLP from DeepDialogue .

2. Low resource Languages :

Addressing the most pressing problem, to be tackled.

There are approximately 7,000 languages in the world, but of these, only a small fraction (20 languages) are considered high-resource languages.

This is a direction where the effort of the community has to be channelised and where all the low fruits are hanging, insights to be explored and transferred to other languages.

There are hardly 6 papers on Paper's with code.

Some pointers to work on, given by community experts:

  • Methods for gathering data and training language models for under resourced languages.

  • Effective cross-domain transfer for low resource scenarios

Check out this link here for a much more thorough understanding on the problem. Some pointers on African MT.

1. Natural Language Understanding :

This is the most open problem, tied up with many other connected problems that need to be solved by the community to achieve this higher level cognitive task, which may very well require insights, methodologies from other ML areas like reinforcement learning, domain adaptation, zero shot learning etc.

Below are some open pointers on which one could work on.

  • Coreference resolution, Polysemy, Text/Document Summarization.

  • Arguments/reasoning, sarcasm and humour.

  • Representing large contexts efficiently.

  • Grounded language learning, i.e. jointly learning a world model and how to refer to it in natural language.

" Be ambitious. Do not limit yourself to reading NLP papers. Read a lot of machine learning, deep learning, reinforcement learning papers. " - Yoshua Bengio

Table of topics :

  1. Natural language understanding

  2. Ambiguity

  3. Synonymy

  4. Syntax

  5. Coherence

  6. Coreference

  7. Personality, intention and style

  8. Text Summarization

  9. Humour and ambiguity

  10. Polysemy

  11. Keyphrase extraction

  12. Knowledgebase population (KBP)

  13. NLP for low-resource scenarios

  14. Cross-lingual learning

  15. Bilingual dictionary induction

  16. Reasoning about large or multiple documents

  17. Multi-task learning

  18. Discourse parsing

  19. Task-independent architecture improvements

  20. Datasets, problems, and evaluation

  21. Task-independent data augmentation for NLP

  22. Few-shot learning for NLP

  23. Transfer learning for NLP

  24. Semi-supervised learning

  25. Frame-semantic parsing

This log post would not have been possible without building on top of Sebastian Ruder's blog and his effort on getting inputs from the NLP community all compiled in one doc file. You should definetely check that out.


Frontiers in Natural Language Processing Expert Responses.