Session Details

Brief Introduction to Natural Language Processing  

Level :
Intermediate
Date :
3:30 PM Saturday
Room :
8338
Interested : (-) - Registered : (-)

Presentation

The majority of interesting information on the web is in the form of unstructured natural language data, written by humans for consumption by other humans. Natural language processing tools allow us to take data such as new articles, blog posts, tweets, and reviews and then extract meaningful structured information. For example, using named-entity recognition and sentiment analysis, your code can look at a document and identify what people, organizations, products, and places are mentioned within it and whether or not they're described in a positive or negative light. Using natural language parsers, it's possible to take a sentence and recover who's doing what to whom. Other tools can automatically construct tag sets or identify interesting characteristic phrases. In this session, I will provide a brief introduction to natural language processing, and an overview of what tool sets, APIs and libraries are available. Code samples will be presented in Python and Java. However, the talk should be of general interest to anyone working with language data. Topics that will be covered include: * Sentiment analysis * Identification of named entities (e.g., people, locations, and places) * Natural language parsing * Document classification and automatic extraction of tag sets. * Summarization of documents Toolkits that will be covered include Python's Natural Language Toolkit (NLTK) and Stanford's JavaNLP. APIs discussed will be OpenCalais and AlchemyAPI.

The Speaker(s)

img

Daniel Cer

I am a final year Ph.D. candidate at the University of Colorado at Boulder and I work in the Stanford NLP lab with Dan Jurafsky and Chris Manning. I do research in natural language processing (NLP) with a focus on statistical machine translation (SMT/MT). I am one of the primary authors of the Stanford Phrasal machine translation system. I have also done some work in textual entailment, speech recognition, and parsing.
  • Not Interested
  • Interested
  • Attending

SPONSORS List