IST 736: Text Mining

This course introduces concepts and methods for gaining insight from a large amount of text data. Students learn the application of text mining techniques for business intelligence, digital humanities and social behavior analysis. The main goal of this course is to increase student awareness of the power of a large amount of text data and the computational methods used for finding patterns in large text corpora. It introduces text mining technologies rooted in machine learning, natural language processing and statistics. It also showcases the applications of text mining technologies in information organization and access, business intelligence, social behavior analysis and digital humanities. Students will also focus on machine learning for unstructured data using a Python-based command line tool called scikit-learn and a range of machine learning techniques, such as Naïve Bayes and support vector machines. 

Learning Objectives:

  • Describe basic concepts and methods in text mining, such as document representation, information extraction, text classification and clustering, and topic modeling
  • Use benchmark corpora, commercial and open-source text analysis and visualization tools to explore interesting patterns
  • Understand conceptually the mechanism of advanced text mining algorithms for information extraction, text classification and clustering, opinion mining and their applications in real-world problems
  • Choose appropriate technologies for specific text analysis tasks, and evaluate the benefit and challenges of the chosen technical solution

Concepts & Tools:

  • Python / scikit-learn
  • Machine learning
  • Natural language processing (NLP)
  • Reproducible research
  • Corpus statistics
  • Information organization and access
  • Business intelligence
  • Social behavioral analysis
  • Digital humanities

BACK TO CURRICULUM

Learn More

Now is the time to earn your master’s in data science online.