Project 2 Week 4/4: End and begin!

Goal: Wrap up, reflect, plan

Admittedly, did not get much farther in the course over the past day. But I feel like I have a solid foundation in NLP and I can easily wrap up this course in the future. I do plan to, as I really would like to apply it to a set of text data that I have access to at work. I’d be excited to get a better count of who reported using specific tools and techniques by lemmatizing the data. The data set is just large enough to be analyzed that way, and small enough to use more computationally dense approaches.

I realized that NLP is actually more understandable than I thought it would be. Once I got past the hurdle of understanding and using NLTK, I found that a lot of the data skills I’ve learned at the iSchool so far make NLP really easy to understand. To be entirely honest, it’s validating to see what I’ve learned over the past 2 years be applicable to new skills. This familiarity also allows me to take bits and pieces of the NLP process and apply those techniques in different ways to my own projects. Because now I know and understand the basics of NLP, I can either use that entire established process through or “mix and match” the toolkit functions I’ve learned.

The end of this project marks the end of my non-physical technologies. I learned the basics of JavaScript and NLP for the first and second projects. Both these projects required familiarizing myself with an entirely new skill, so I found that it was much harder than expected to get started with each. However, now that I’ve done the more difficult part of getting started with a huge new concept, I feel confident in my ability to keep up these skills as I move forward in my career. In a larger sense, I do feel much more confident in how long it takes to teach yourself a new skill. Turns out: takes a while! But, now I can move forward with continuing these skills or starting new ones with a better idea of how to plan for it.

So even though after 2 projects I’ve finally figured out how to get the most of these self-teaching opportunities, I’m flipping the script a little with my third project. It’s a “physical”-ish project; I’d like to get better at listening to and speaking Spanish. While I’ve been working on Spanish grammar and vocabulary for about 1-2 months, I’ve been finding that I’m not really progressing in my listening and speaking skills. I want to dedicate my efforts to practicing Spanish in ways that does not just help me learn the language, but also helps me practice its use.

There are three components I’m focusing on:

  • Listening
  • Constructing sentences
  • Speaking

The first major difference with this project is now that we’re quarantined, I’ve had to switch up my schedule. I’ve decided to take this opportunity to move from a 5-hour studio learning session to daily 1-hour practice sessions. This is much more suitable for language learning anyway, since daily exposure and repetition are important.

The second major difference is instead of using one course or project, I’m using multiple activities to practice these three components. The most important of my activities is to continue my Language Transfer course. Language Transfer is an entirely free language-learning resource that, for Spanish, is a series of 90 short audio lessons. I’m around chapter 60 or so now and I can’t recommend it enough; I’ve learned so much in such a short time and the lessons are engaging and interesting. The instructor teaches by asking you to construct and speak sentences using whatever grammar rule or vocabulary he just talked about. Previously, I’ve been either just listening (there’s a person who he teaches in the audio and I listen to her answers) or writing down my answers. I’d like to finish up the course and start always answering aloud, practicing both the speed of my answer and my pronunciation. So, Language Transfer will help with all three components. 

Each weekday I plan to listen to one or two Language Transfer lessons and do one other activity. I have activities planned for each day, as follows:

  • Thursday: Journal (constructing sentences)

I want to journal in Spanish to help practice constructing my own sentences. I’d like to read it aloud as I go too.

  • Friday: Music (listening)

I’m going to listen to music, try to write down what I hear, and check. This hopefully will help me better identify words and sounds.

  • Monday: TV (listening)

I’m going to watch a TV show in Spanish with Spanish closed-captioning (not dubs with subs, since they don’t always match), listening closely to the words and slowly trying to rely less on the subs.

  • Tuesday: Read aloud (speaking)

I’m less certain of this one, but I’d like to try reading simple stories written in English aloud in Spanish.

  • Wednesday: Test (all three)

I have the added benefit of being quarantined with a native Spanish speaker, Cosme. Each Wednesday, I’ll spend my hour trying to speak with him, and listening and writing about the feedback he gives me. This will be a helpful metric for my improvement and how I should switch up the activities.

Because I do need to expand my vocabulary outside of my daily tech studio hours, I’m going to be immersing myself in grammar and more vocab. The biggest help right now is that I’ve started playing Animal Crossing in Spanish. I plan on keeping this up throughout the month, so hopefully that will expose me to a lot of words and phrases. So far, Cosme has been helping me by answering questions I have about phrases and idioms that pop up in the game.

Project 2 Week ?/4: Post-break re-planning

GOAL: Doing as much of the course as I can!

Once again, I’ve hit a change of plans. This time it has nothing to do with my own overestimation of what I can accomplish! Instead, it’s due to my schedule changing rather dramatically since social distancing began. Since my classes are primarily asynchronous now, I’ve switched my hours to better suit my work needs, which means that I distributed my 5 tech learning studio hours over each weekday, one hour per day. There’s the added benefit of this format being very suitable for my third project. More on that in a different post! Here, I just want to talk about working on Project 2.

Long story short, I’m 50/50 on if I’ll finish the LinkedIn Learning course, and I likely won’t apply it to my project. I do plan on applying it sometime, though, just not within the scope of this project. For now, I’m doing as much of the course as I can.

I’m jumping back into the course after a 2 week break due to extended spring break. Trying to get back into learning a technology feels very similar to the struggle of getting started working on my personal projects, like some of my dashboards. It’s challenging to get back into the flow of what I was learning when so much of these lessons are dependent on what I was doing just before, in the same way that picking up personal projects means trying to jump back into a previous train of thought. What I have for this course that I don’t usually have, though, are my previous blog posts! Re-reading my posts was really helpful to remind me of the bigger NLP picture and where I was 2 weeks ago. This experience serves as a good motivator to start taking notes while working on personal projects. In fact, there are many areas of my personal professional development that I’d like to better document like this. Maybe keeping up this blog (or a similar one) for when I work on new dashboards would be good? Also, using Trello boards for individual projects could similarly help.

I worked through a couple chapters of the course and honestly, once I got going it was engaging. Plus, I figured out what the lambda part of the function is for by seeing it used a couple times (it’s just applying a function to all rows of a column of the dataset to output another list). I finished learning about lemmatizing and stemming and I’m already considering how I can apply these to both my personal projects and a work project. Lemmatizing seems well-suited to a sizeable but manageable amount of text data, which applies to 2 current projects. In one, I manually wrote code to look for variations on specific words. Now I know I can simplify that code and easily apply it broadly using what I learned in just one of the chapters.

Tomorrow, I’ll wrap up what I can of the course and reflect on Project 1 and Project 2.

Project 2 Week 2/4: Processing that language naturally!

GOAL: Continue the LinkedIn Learning course for NLP!

My plan is still as follows:

Week 1: Start the LinkedIn Learning NLP course

Week 2: Continue/finish the NLP course

Week 3: Finish NLP course if needed and plan how to incorporate NLP into my script

Week 4: Improve my script

(Except due to having to miss class, I’m moving my weeks down, but they’ll still be done in these 4 chunks.)

I’m currently working on Week 2, continue/finish the NLP course. I’ve finally got a good flow for going through these and doing the exercises, so this week I’ll just document what goes well and what doesn’t.

I’ve noticed immediately that it’s probably going to take longer than I thought to finish the course. I have about 3 hrs 40 minutes left on my course, but it takes me about twice as long to do each lessons because I’m starting and stopping it to actually do the exercise. But I am watching at 1.5x speed, and I might do this less in later videos, but I do want to follow along to retain as much as possible. Therefore, it might take all of Week 3 to finish the course, so I only have Week 4 to implement it in my script. That should be plenty of time, since I would probably be adding a singular small thing to my script for the purposes of this project.

An added bonus of this course I hadn’t noticed before is the modeling of a workflow for a NLP project. For example, Chapter 1 Exercise 4 takes you through important steps in understanding your data set. I’m learning process as well as the technical skills, so that’s nice for when I want to apply these skills to my own work. They also outline the big-picture steps for NLP:

  1. Collect raw text
  2. Tokenize
  3. Clean text
  4. Vectorize
  5. Machine learning algorithm

A huge RegEx tip I didn’t know is that capitalizing flips what is being searched for. So, if w looks for a word/letter characters, capital W looks for non-word/letter characters. Also, findall() and split() are the main functions for tokenizing.

As the course goes on, there are more complicated functions being created. It’s becoming a little harder for me to follow, but it’s definitely still understandable and relatively easy to understand. I’ve also started piecing together parts of complicated functions just by seeing it used a couple of times. Much like inferring vocab and grammar by listening to a language you’re learning, I’m inferring things like how lambda works in Python.

I also am learning a lot of general Python skills that will make data management skills in Python better. I’m learning some good practices that I’m excited to apply when I revise my data scraping script.

In regards to using a web-hosted Jupyter notebook instead of a local one, I’ve only encountered one hang up where I had to use a downloader for a NLTK package. But otherwise that’s been going great! And it’s nice to have all these notebooks in my Google Drive.

Project 2 Week 1/4: Time to process some language

GOAL: Start the LinkedIn Learning course for NLP!

Oops, I’m doing a completely different thing! It’s natural language processing time. I wanted to work on my TAZ project, and after talking it through I realized introducing NLP into my scraping will help my data set be much more reliable!

As I learned last project, LinkedIn Learning is really helpful for starting a brand new technology. That extra level of guidance is important for something I’ve never worked on before. While I’ve used Python for the TAZ script, I want to improve and supplement it. NLP is new enough for me that I think I need course guidance before I jump in.

My plan is as follows:

Week 1: Start the LinkedIn Learning NLP course

Week 2: Continue/finish the NLP course

Week 3: Finish NLP course if needed and plan how to incorporate NLP into my script

Week 4: Improve my script

I’ve spent a while today clearing out my computer to make room for the downloads the course uses and then installing the correct version of Python for NLTK. I had some difficulties with running locally, so I decided to run through a notebook on Google Colab instead; I just had to make sure that I was importing the exercise files at the start of each notebook. It’s been working really well so far! Hoping I don’t run into any difficulties there.

Otherwise, I’m just going through the course now. I’m also learning more about pandas by working through this, which is nice. So far I have the exact amount of previous Python knowledge needed for this course. It’s at a good pace for my skill level and I feel like I’m understanding everything being said.