Thing Two drew this for me this morning. I love her vivid imagination!
In a recent comparison of different text extraction algorithms, Gravity’s open source project: Goose tied for second place and was even written up over at Read Write Web! I find this very exciting because our project is still quite young and actively in development whereas the algorithms in close standing are mostly well established and semi-finalized. Another interesting point is that most of the competition was built by teams of researchers, you know… Doctors in their fields!
The graph below from Tomaž Kovačič‘s study shows only a small amount of the data he collected in his analysis. If you are curious of how he compared these algorithms, I highly recommend you head over to his post. He does a great job exposing the details behind his analysis.
So what is Goose used for at Gravity and why have we open sourced it?
Goose’s wiki provides a very detailed explanation about what Goose is and how it works, and also touches on the original need we had at Gravity behind its creation. Jim Plush wrote the first version from the ground up on his own and only recently gave me commit access to the repository. By the time I got into the project, it had all the bells and whistles required to compete in the analysis completed by Kovačič. My contributions to Goose have been to extend it to allow for more specific extractions of additional meta data outside of the primary content and have no effect on its standings above.
Such a utility can be applied to a wide variety of web content analysis problems, and I’m really glad Plush decided to share it with the rest of the open source community. At Gravity, we have been building a lot of exciting (to me at least) technology and most of it is held dearly by us and needs to remain a company secret as they make up a large part of our company’s overall value. When it comes to analyzing the content out here on the web, Goose can be looked at as our trusty messenger delivering our system plenty of content to analyze without a lot of the noise that comes along with it on the web pages the content is sourced from.
If you are looking to mine some of the golden nuggets of information that is buried under a ton of ads, peripheral links, site menu structures, and other distracting noise, then why not take a look at what Goose has to offer? If you find anything you think Goose may be lacking or have some ideas on anything else that may be improved, let us know on our Github repository: https://github.com/jiminoc/goose
So last night while I was traveling home via public transit, I was also trying to keep in contact with my wife via instant messaging. This is a common practice for my wife and I so that I can enjoy my trip more and she can know that I am safe and getting closer to home.
Well things seemed fairly normal in the conversations I was having except it did seem that she was having a harder time than usual understanding what I was saying. I had a long wait between buses and I was getting pretty hungry, so I started instant messaging questions about my dinner options for home. In come responses from my wife that they had “GoodStuff” for dinner and when I asked if there was any left for me, a resounding “Yes!” with smileys came back. I knew that they must have finished eating hours ago, so I made a request for her to start reheating it so that I could eat quickly and then move on to doing bedtime for our two daughters. I was so happy when a quick IM response came back saying: “Sure! OK!” and again a long line of various smileys.
More small chat continued until I arrived home to find an empty table and empty stove. Although it was nice to see that everything was so neat and clean, I was a little disappointed that there was no hot dinner for me after the IM conversation we just had. I then noticed that my wife was busy with our laundry and my 7 year old daughter (aka: Thing Two) was next to her holding my wife’s phone (we use our phones for instant messaging). Not only was Thing Two happy to see me, she was also laughing a lot more than usual. I asked my wife about the IM conversation we had moments ago, and she looked a little confused. This is when Thing Two jumps up and says: “I fooled you Daddy! You thought I was Mommy!” We all had quite a laugh.
It was just shocking at how I was not able to notice the difference. My wife tends to be very terse in her IM communication, so it did not seem odd for me to ask a long question and then receive a small “ok” response. Boy has Thing Two come a long way in her pranks. I’m both proud of her and a little scared for what we’re in for as well.
LOL O_o
Things have been so exciting working @gravity this year and I am amazed at how much we have accomplished in the last few months. All of this excitement and hard work has worn my not-twenty-something body out just a tad and my family has become quite used to my MacBook on my lap during weekend afternoons.
So…. How nice it is that we all get a 3-day weekend to relax and spend some quality time with those people (and/or cats) in our lives that do not work at Gravity? I have also been able to work on this blog once again. After a nice day out with the family on this beautiful spring day, I found myself curious about what I needed to do next… NOTHING!
How great is that? I was really dumbfounded for a while until my brain chimed in with a wonderful suggestion: Relax! Well that is just what I am doing right now.
Here’s to having a moment to relax and accepting it! Hope you get a moment or to yourself.
peace.

1,317 of 4,947 songs added... OH HELL YEAH!
I just gained access to Google’s Music Beta and for the first time, I think my personal music library may be smaller than the amout of cloud storage available for free!
It is truly amazing just how far we have come from the the early days of cramming mp3′s into a JPEG image to store on a free image hosting site back in the 90′s. Napster not only broadened what was possible for music online, but also inadvertently set us all back a decade of fighting to truly OWN the music we legally purchase from the big music labels.
Yes, I know there are a lot of you that have second thoughts about giving so much to the Google Collective, and I have no beef with you and your own convictions.
I for one welcome our new online music overlords!
This is the first time that the creator of my programming language of choice sends me an email.
Scala 2.9 was just released this week and the development team at Gravity are working to migrate our code base onto it. Just after our first attempt to run our unit tests, we hit a bug that we could not code around and hit the forums for answers. I found an already reported bug that matched our case as well and jumped on the ticket to receive updates. Later that day I saw a comment on it from Martin Odersky (the original author of Scala) himself. That was exciting enough for me, but the email…. WOW.
…okay, I can go back to my day now.
Amazingly, just after one day of me installing/configuring this new WordPress blog, a search on Google for my name: Robbie Coleman returns this site as the very first result!





Follow Me