Natural Language Processing in Ruby

How to parse ‘go’
Natural Language Processing in Ruby
Tom Cartwright
@tomcartwrightuk
!

keepmebooked
giveaiddirect.com

Python, surely?
Yes. The NLTK is awesome.
But you have a Ruby-based app.

Extracting meaning from !
human input
Summarisation
Extracting entities
Tagging text
Sentiment analysis
Filtering text

document

sentence

From document level!
!
!
!
!

word

example

to word level

document

sentence

word

example

Chunking & segmenting
Breaking text into paragraphs, sentences and other zones
Start with a document/some text:
“The second nonabsolute number is the given time of
arrival, which is now known to be one of those most bizarre
of mathematical concepts, a recipriversexclusion, a number
whose existence can only be defined as being anything other
than itself…..”

document

sentence

word

Punkt sentence tokenizer to the rescue….

example

document

sentence

word

example

tokenizer = Punkt::SentenceTokenizer.new(!
"The second nonabsolute number is the given time
of arrival...")!
!

result = !
tokenizer.sentences_from_text(text,!
:output => :sentences_text)!
!
!
!

document

sentence

word

example

Training

trainer = Punkt::Trainer.new()!
trainer.train(bistromatic_text)

document

sentence

word

example

Tokenising
Breaking text into words, phrases and symbols.
“Time is an illusion. Lunchtime
doubly so.”.split(“ “)!
!

#=> !
!

[“Time", “is", “an", “illusion.”,
“Lunchtime", “doubly", “so.”]!

document

sentence

word

example

Tokenizer gem
Regexes and rules
class Tokenizer

FS = Regexp.new(‘[[:blank:]]+')
PAIR_PRE = ['(', '{', '[']
SIMPLE_POST = ['!', '?', ',', ':', ';', '.']
PAIR_POST = [')', '}', ']']
PRE_N_POST = ['"', “'"]
…

document

sentence

word

tokenizer = Tokenizer::Tokenizer.new
tokenizer.tokenize(“Time is an
illusion. Lunchtime doubly so.”)

#=>

[“Time", “is", “an", “illusion", “.”,
“Lunchtime", “doubly", “so", “.”]

example

document

sentence

word

example

Stemming
Jogging => Jog
“jogging”.gsub(/.ing/, “”) !
#=> “jog"!
!

“bring”.gsub(/.ing/, “”) !
#=> “b"

document

sentence

1. Ruby-Stemmer
2. Text

word

example

multi-language porter stemmer

porter stemmer

stemmer = Lingua::Stemmer.new(:language => "en")
stemmer.stem("programming") #=> program
stemmer.stem("vimming") #=> vim

document

sentence

word

example

Parts-of-speech tagging
CC

conjunction

DET

determiner

and, but
this, some

IN

preposition / conjunction

JJ

adjective

NNP

above, about

orange, tiny

proper noun

Camden Pale Ale

document

sentence

word

A couple of methods!
!

Regex tagger
/*.ing/
VBG
/*.ed/

VBD
!

Lookup on words
E.g.
calculating : { VBG: 6 }
orange: { JJ: 2, NN: 5 }

example

document

sentence

word

example

A tale of two taggers
EngTagger

rb-brill-tagger

Probabilistic (uses

•

Rule based

look up table prev.

•

•

C extensions

slide)
•

Brown corpus trained

•

Pure ruby

document

sentence

word

example

Treat gem
Bundles many of the gems shown
Wraps them in a DSL
s = sentence(“A really good sentence.”)
s.do(:chunk, :segment, :tokenize, :parse)

stemming; tokenising; chunking; serialising;
tagging; text extraction from pdfs and html;

LRUG Sentiments
A tag

{NN}

Pass in regex => /({JJ}|{JJS})({NNS}|{NNP})/
And some tagged tokens
#=> [(Word @tag="JJ", @text="jolly"),!
(Word @tag="NN", @text="face")]

Sentimental value
1.0
!
1.0
0.21875
0.21875
-1.0
-1.0

epic!
good!
chance!
brisk!
slanderous!
piteous

Results
!
!
!
•
•

•
•
•

Ruby!
Practical ObjectOriented Design in
Ruby!
Doctors!
Lrug!
recruiters (!)

•
•
•

dedicated servers!
pdfs!
Surrey

•

•
•
•
•

unsolicited phone
calls from
r********s!
clients!
Paypal!
XML!
geeks

Gems
Text - Paul Battley’s box of tricks
Treat
Tokenizer
Punkt segmenter
Chronic - for extracting dates

Other things you can do/I didn’t talk about
Calculate text edit distance
Extract entities using the Stanford
libraries via the RJB
!

Extract topic words (LDA)
!

Keyword extraction - TfIdf
!

Jruby

Thank you for processing.
Questions?
@tomcartwrightuk

Thanks to Tim Cowlishaw and the HT dev
team for specialised rubber duck support

Natural Language Processing in Ruby

More Related Content

What's hot

Viewers also liked

Recently uploaded

Natural Language Processing in Ruby