Northeastern University
Sign Up

March 26: 2:30–3:30pm London/10:30–11:30am Boston/7:30–8:30am Oakland
Optional pre-workshop: Introduction to Python, virtual
March 27: 4:30–6:30pm London/12:30–2:30pm Boston/9:30–11:30am Oakland
Workshop: Introduction to Machine Learning for Text Analysis, virtual


Machine learning allows humans to create a model that can act as an extension of the creator’s mind and classify data based on predetermined categories. Manually tagging thousands of rows of data can often be cumbersome and time consuming. Forming a human-machine relationship to classify data can save researchers time and help catalyze data analysis and classification on projects that would otherwise take an untenable number of working hours. 


This workshop will teach participants how to use Python for machine learning and text classification, creating a human-machine relationship to process and classify textual datasets. Learn how to use the Natural Language Toolkit (NLTK) to explore data. Use pandas, a Python library with extensive functionality to manipulate data, to clean and manipulate a dataframe (a table in pandas). Participants will also learn how to engineer textual features and build machine learning classification pipelines with SciKitLearn (a popular open source machine learning library). Examples of projects that can be undertaken using these methods include identifying a behavioral health component in police incident narratives, identifying hate speech on Facebook, and identifying wildlife trafficking posts on Twitter. 


The optional pre-workshop covers the Python programming language tasks required to successfully participate in the workshop. Learners who are unfamiliar with how to use loc functions and python string methods to manipulate data and create new columns should plan to attend the pre-workshop, and all are welcome to attend.


This event is free and open to the public, but registration is required. RSVP here. 

Event Details

  • Nhat Pham

1 person is interested in this event

User Activity

No recent activity