Silicon Valley Code Camp : October 9th and 10th, 2010

Daniel Cer

unassigned
About Daniel
I am a final year Ph.D. candidate at the University of Colorado at Boulder and I work in the Stanford NLP lab with Dan Jurafsky and Chris Manning. I do research in natural language processing (NLP) with a focus on statistical machine translation (SMT/MT). I am one of the primary authors of the Stanford Phrasal machine translation system. I have also done some work in textual entailment, speech recognition, and parsing.
{speaker.firstName} {speaker.lastName}

Speaking Sessions

  • Brief Introduction to Natural Language Processing

    3:30 PM Saturday   Room: 8338
    The majority of interesting information on the web is in the form of unstructured natural language data, written by humans for consumption by other humans. Natural language processing tools allow us to take data such as new articles, blog posts, tweets, and reviews and then extract meaningful structured information. For example, using named-entity recognition and sentiment analysis, your code can look at a document and identify what people, organizations, products, and places are mentioned within it and whether or not they're described in a positive or negative light. Using natural language parsers, it's possible to take a sentence and recover who's doing what to whom. Other tools can automatically construct tag sets or identify interesting characteristic phrases. In this session, I will provide a brief introduction to natural language processing, and an overview of what tool sets, APIs and libraries are available. Code samples will be presented in Python and Java. However, the talk should be of general interest to anyone working with language data. Topics that will be covered include: * Sentiment analysis * Identification of named entities (e.g., people, locations, and places) * Natural language parsing * Document classification and automatic extraction of tag sets. * Summarization of documents Toolkits that will be covered include Python's Natural Language Toolkit (NLTK) and Stanford's JavaNLP. APIs discussed will be OpenCalais and AlchemyAPI.