Project 2 Week 2/4: Processing that language naturally!

GOAL: Continue the LinkedIn Learning course for NLP!

My plan is still as follows:

Week 1: Start the LinkedIn Learning NLP course

Week 2: Continue/finish the NLP course

Week 3: Finish NLP course if needed and plan how to incorporate NLP into my script

Week 4: Improve my script

(Except due to having to miss class, I’m moving my weeks down, but they’ll still be done in these 4 chunks.)

I’m currently working on Week 2, continue/finish the NLP course. I’ve finally got a good flow for going through these and doing the exercises, so this week I’ll just document what goes well and what doesn’t.

I’ve noticed immediately that it’s probably going to take longer than I thought to finish the course. I have about 3 hrs 40 minutes left on my course, but it takes me about twice as long to do each lessons because I’m starting and stopping it to actually do the exercise. But I am watching at 1.5x speed, and I might do this less in later videos, but I do want to follow along to retain as much as possible. Therefore, it might take all of Week 3 to finish the course, so I only have Week 4 to implement it in my script. That should be plenty of time, since I would probably be adding a singular small thing to my script for the purposes of this project.

An added bonus of this course I hadn’t noticed before is the modeling of a workflow for a NLP project. For example, Chapter 1 Exercise 4 takes you through important steps in understanding your data set. I’m learning process as well as the technical skills, so that’s nice for when I want to apply these skills to my own work. They also outline the big-picture steps for NLP:

  1. Collect raw text
  2. Tokenize
  3. Clean text
  4. Vectorize
  5. Machine learning algorithm

A huge RegEx tip I didn’t know is that capitalizing flips what is being searched for. So, if w looks for a word/letter characters, capital W looks for non-word/letter characters. Also, findall() and split() are the main functions for tokenizing.

As the course goes on, there are more complicated functions being created. It’s becoming a little harder for me to follow, but it’s definitely still understandable and relatively easy to understand. I’ve also started piecing together parts of complicated functions just by seeing it used a couple of times. Much like inferring vocab and grammar by listening to a language you’re learning, I’m inferring things like how lambda works in Python.

I also am learning a lot of general Python skills that will make data management skills in Python better. I’m learning some good practices that I’m excited to apply when I revise my data scraping script.

In regards to using a web-hosted Jupyter notebook instead of a local one, I’ve only encountered one hang up where I had to use a downloader for a NLTK package. But otherwise that’s been going great! And it’s nice to have all these notebooks in my Google Drive.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s