okansas.blogspot.com
Occassional thoughts about orienteering


Sunday, December 09, 2012

Playing with training text analysis

 

I read training logs for a number of orienteers at Attackpoint.  It is fun to see how people train.  It is especially fun to read what they write about their training and racing.  Attackpoint makes it pretty easy to get summaries of the basic information about how someone trains.  Say you want to see how Emily Kemp trained in October.  In three mouse clicks you can get a summary with the amount of training by type with a graph showing day-by-day totals.  You also get narrative description like:
my room mates had started hanging their socks on the spokes of my bike so i figured it was time to fix my flat tire and get it back on the road. the arm did fabulously!! hardly any pain at all and it was so amazing to feel the wind through my hair!! :)
And
For a university campus I definitely wasn't expecting so many little passageways and tricky spots. Good thing I've been practising at reading my control descriptions! I never really made any large mistakes however I did feel like I had a lot of hesitations. On the way to control 10 what is marked as an overhang passageway thingy actually goes right through a building. I think I spent a good 5sec with one foot in the doorway trying to figure out if I would be disqualified or not! The top 3 places were super close with Celine coming in first, me 4sec behind, and Isia 1sec behind me. Eek! I definitely put everything out there and don't have many regrets although it would have been nice to find those 4 sec somewhere ;)
The narratives are a lot more fun to read than the base description of a training session.  The base description that goes with the first quote is "velo 41:31  [1]."  The base description that goes with the second quote is "orienteering race 15:17 [3] 2.4 km (6:22/km)."

At work, I've been analyzing written responses to open ended survey questions using "text mining" and it seems as if that sort of approach might be worth using to look at the narrative portion of an orienteer's training at Attackpoint.

A simple example...

I started by collecting all of the narrative descriptions from my log for the last year of entries.  That gives me 544 small bunches of text that I'd written in my log.  I cleaned up the text by removing numbers, punctuation and white space.  I did some further cleaning by taking out the common English words that don't really carry much information.  These "stop words" are terms like "the", "is" and "at".  Finally, I combined words that describe essentially the same term.  So, the words "orienteering", "orienteer" and "orienteers" are all treated as the same and are renamed "orient".  Once the text is cleaned-up, I can start looking at it.

I created a list of every word that appears and of how many times that word appeared in each of the 544 entries.  The result is a big table that tells me which words appear or don't appear in each entry.  For example, I know that the 7th entry doesn't include the words "basketball" or "beer" but does include the words "work" and "commute" (and it includes each of those words once).

You can start to look at the entire year of training and find the words that show up most often.  For example, among the 48 terms that show up at least 30 times are: bike, fun, jog, map, mtb, orient[eer], train, warm and work.

You can also get figure out how different words are correlated with each other.  Take a term like "compass" and calculate the terms most frequently correlated with compass, which include:

Carolina
Cleaner
Sharpen
Straight
North

I know why those terms are correlated (I ran a race in North Carolina without a compass because I feel like my navigation is sharper when I run without a compass - especially if I haven't recently done much O' technique training).

It can be fun to explore the text by looking for correlated terms.  I do a lot of my running at biking at Clinton Lake.  Here are some terms correlated with Clinton: trail, run and snake.

Playing around with the text is fun and I suspect that it could even be useful once I've learned more about how to do "text mining."  One of the really easy things to do with text data is to create a word cloud.  Here's the word cloud of my last 365 days of training log narrative:



Back to okansas.blogspot.com.

posted by Michael | 11:19 AM

0 comments


Comments: Post a Comment
March 2002April 2002May 2002June 2002July 2002August 2002September 2002October 2002November 2002December 2002January 2003February 2003March 2003April 2003May 2003June 2003July 2003August 2003September 2003October 2003November 2003December 2003January 2004February 2004March 2004April 2004May 2004June 2004July 2004August 2004September 2004October 2004November 2004December 2004January 2005February 2005March 2005April 2005May 2005June 2005July 2005August 2005September 2005October 2005November 2005December 2005January 2006February 2006March 2006April 2006May 2006June 2006July 2006August 2006September 2006October 2006November 2006December 2006January 2007February 2007March 2007April 2007May 2007June 2007July 2007August 2007September 2007October 2007November 2007December 2007January 2008February 2008March 2008April 2008May 2008June 2008July 2008August 2008September 2008October 2008November 2008December 2008January 2009February 2009March 2009April 2009May 2009June 2009July 2009August 2009September 2009October 2009November 2009December 2009January 2010February 2010March 2010April 2010May 2010June 2010July 2010August 2010September 2010October 2010November 2010December 2010January 2011February 2011March 2011April 2011May 2011June 2011July 2011August 2011September 2011October 2011November 2011December 2011January 2012February 2012March 2012April 2012May 2012June 2012July 2012August 2012September 2012October 2012November 2012December 2012January 2013March 2013April 2013May 2013July 2013September 2013
archives
links