Project 2 Week ?/4: Post-break re-planning

GOAL: Doing as much of the course as I can!

Once again, I’ve hit a change of plans. This time it has nothing to do with my own overestimation of what I can accomplish! Instead, it’s due to my schedule changing rather dramatically since social distancing began. Since my classes are primarily asynchronous now, I’ve switched my hours to better suit my work needs, which means that I distributed my 5 tech learning studio hours over each weekday, one hour per day. There’s the added benefit of this format being very suitable for my third project. More on that in a different post! Here, I just want to talk about working on Project 2.

Long story short, I’m 50/50 on if I’ll finish the LinkedIn Learning course, and I likely won’t apply it to my project. I do plan on applying it sometime, though, just not within the scope of this project. For now, I’m doing as much of the course as I can.

I’m jumping back into the course after a 2 week break due to extended spring break. Trying to get back into learning a technology feels very similar to the struggle of getting started working on my personal projects, like some of my dashboards. It’s challenging to get back into the flow of what I was learning when so much of these lessons are dependent on what I was doing just before, in the same way that picking up personal projects means trying to jump back into a previous train of thought. What I have for this course that I don’t usually have, though, are my previous blog posts! Re-reading my posts was really helpful to remind me of the bigger NLP picture and where I was 2 weeks ago. This experience serves as a good motivator to start taking notes while working on personal projects. In fact, there are many areas of my personal professional development that I’d like to better document like this. Maybe keeping up this blog (or a similar one) for when I work on new dashboards would be good? Also, using Trello boards for individual projects could similarly help.

I worked through a couple chapters of the course and honestly, once I got going it was engaging. Plus, I figured out what the lambda part of the function is for by seeing it used a couple times (it’s just applying a function to all rows of a column of the dataset to output another list). I finished learning about lemmatizing and stemming and I’m already considering how I can apply these to both my personal projects and a work project. Lemmatizing seems well-suited to a sizeable but manageable amount of text data, which applies to 2 current projects. In one, I manually wrote code to look for variations on specific words. Now I know I can simplify that code and easily apply it broadly using what I learned in just one of the chapters.

Tomorrow, I’ll wrap up what I can of the course and reflect on Project 1 and Project 2.

Project 2 Week 2/4: Processing that language naturally!

GOAL: Continue the LinkedIn Learning course for NLP!

My plan is still as follows:

Week 1: Start the LinkedIn Learning NLP course

Week 2: Continue/finish the NLP course

Week 3: Finish NLP course if needed and plan how to incorporate NLP into my script

Week 4: Improve my script

(Except due to having to miss class, I’m moving my weeks down, but they’ll still be done in these 4 chunks.)

I’m currently working on Week 2, continue/finish the NLP course. I’ve finally got a good flow for going through these and doing the exercises, so this week I’ll just document what goes well and what doesn’t.

I’ve noticed immediately that it’s probably going to take longer than I thought to finish the course. I have about 3 hrs 40 minutes left on my course, but it takes me about twice as long to do each lessons because I’m starting and stopping it to actually do the exercise. But I am watching at 1.5x speed, and I might do this less in later videos, but I do want to follow along to retain as much as possible. Therefore, it might take all of Week 3 to finish the course, so I only have Week 4 to implement it in my script. That should be plenty of time, since I would probably be adding a singular small thing to my script for the purposes of this project.

An added bonus of this course I hadn’t noticed before is the modeling of a workflow for a NLP project. For example, Chapter 1 Exercise 4 takes you through important steps in understanding your data set. I’m learning process as well as the technical skills, so that’s nice for when I want to apply these skills to my own work. They also outline the big-picture steps for NLP:

  1. Collect raw text
  2. Tokenize
  3. Clean text
  4. Vectorize
  5. Machine learning algorithm

A huge RegEx tip I didn’t know is that capitalizing flips what is being searched for. So, if w looks for a word/letter characters, capital W looks for non-word/letter characters. Also, findall() and split() are the main functions for tokenizing.

As the course goes on, there are more complicated functions being created. It’s becoming a little harder for me to follow, but it’s definitely still understandable and relatively easy to understand. I’ve also started piecing together parts of complicated functions just by seeing it used a couple of times. Much like inferring vocab and grammar by listening to a language you’re learning, I’m inferring things like how lambda works in Python.

I also am learning a lot of general Python skills that will make data management skills in Python better. I’m learning some good practices that I’m excited to apply when I revise my data scraping script.

In regards to using a web-hosted Jupyter notebook instead of a local one, I’ve only encountered one hang up where I had to use a downloader for a NLTK package. But otherwise that’s been going great! And it’s nice to have all these notebooks in my Google Drive.

Project 2 Week 1/4: Time to process some language

GOAL: Start the LinkedIn Learning course for NLP!

Oops, I’m doing a completely different thing! It’s natural language processing time. I wanted to work on my TAZ project, and after talking it through I realized introducing NLP into my scraping will help my data set be much more reliable!

As I learned last project, LinkedIn Learning is really helpful for starting a brand new technology. That extra level of guidance is important for something I’ve never worked on before. While I’ve used Python for the TAZ script, I want to improve and supplement it. NLP is new enough for me that I think I need course guidance before I jump in.

My plan is as follows:

Week 1: Start the LinkedIn Learning NLP course

Week 2: Continue/finish the NLP course

Week 3: Finish NLP course if needed and plan how to incorporate NLP into my script

Week 4: Improve my script

I’ve spent a while today clearing out my computer to make room for the downloads the course uses and then installing the correct version of Python for NLTK. I had some difficulties with running locally, so I decided to run through a notebook on Google Colab instead; I just had to make sure that I was importing the exercise files at the start of each notebook. It’s been working really well so far! Hoping I don’t run into any difficulties there.

Otherwise, I’m just going through the course now. I’m also learning more about pandas by working through this, which is nice. So far I have the exact amount of previous Python knowledge needed for this course. It’s at a good pace for my skill level and I feel like I’m understanding everything being said.