Extraction

String extraction is one of the main tasks that all programmers need. It's often difficult because we don't get an easy string presentation from which to extract useful data/information. Here are some helpful Ruby string-extraction cases.

Extracting Network Strings

Extracting MAC address from string

We need to extract all MAC addresses from an arbitrary string

mac = "ads fs:ad fa:fs:fe: Wind00-0C-29-38-1D-61ows 1100:50:7F:E6:96:20dsfsad fas fa1 3c:77:e6:68:66:e9 f2"

Using Regular Expressions

This regular expression should support Windows and Linux MAC address formats.

Lets to find our mac

mac_regex = /(?:[0-9A-F][0-9A-F][:\-]){5}[0-9A-F][0-9A-F]/i
mac.scan mac_regex

Returns

["00-0C-29-38-1D-61", "00:50:7F:E6:96:20", "3c:77:e6:68:66:e9"]

Extracting IPv4 address from string

We need to extract all IPv4 addresses from an arbitrary string

ip = "ads fs:ad fa:fs:fe: Wind10.0.4.5ows 11192.168.0.15dsfsad fas fa1 20.555.1.700 f2"
ipv4_regex = /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/

Let's find our IPs

ip.scan ipv4_regex

Returns

[["10", "0", "4", "5"], ["192", "168", "0", "15"]]

Extracting IPv6 address from string

ipv6_regex = /^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$/

Extracting Web Strings

Extracting URLs from a file

Assume we have the following string

string = "text here http://foo1.example.org/bla1 and http://foo2.example.org/bla2 and here mailto:test@example.com and here also."

Using Regular Expressions

string.scan(/https?:\/\/[\S]+/)

Using standard URI module This returns an array of URLs

require 'uri'
URI.extract(string, ["http" , "https"])

Extracting URLs from web page

Using above tricks

require 'net/http'
URI.extract(Net::HTTP.get(URI.parse("http://rubyfu.net")), ["http", "https"])

or using a regular expression

require 'net/http'
Net::HTTP.get(URI.parse("http://rubyfu.net")).scan(/https?:\/\/[\S]+/)

Extracting email addresses from web page

email_regex = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
require 'net/http'
Net::HTTP.get(URI.parse("http://isemail.info/_system/is_email/test/?all")).scan(email_regex).uniq

Extracting strings from HTML tags

Assume we have the following HTML contents and we need to get strings only and eliminate all HTML tags

string = "<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is another <strong>contents</strong>.</p>
</body>
</html>"
puts string.gsub(/<.*?>/,'').strip

Returns

Page Title
This is a Heading
This is another contents.

Parsing colon separated data from a file

During a pentest, you may need to parse text that has a very common format as follows

description : AAAA
info : BBBB
info : CCCC
info : DDDD
solution : EEEE
solution : FFFF
reference : GGGG
reference : HHHH
see_also : IIII
see_also : JJJJ

The main idea is to remove repeated keys and pass to one key with an array of values.

#!/usr/bin/env ruby
#
# KING SABRI | @KINGSABRI
# Usage:
# ruby noawk.rb file.txt
#
file = File.read(ARGV[0]).split("\n")
def parser(file)
hash = {} # Datastore
splitter = file.map { |line| line.split(':', 2) }
splitter.each do |k, v|
k.strip! # remove leading and trailing whitespaces
v.strip! # remove leading and trailing whitespaces
if hash[k] # if this key exists
hash[k] << v # add this value to the key's array
else # if not
hash[k] = [v] # create the new key and add an array contains this value
end
end
hash # return the hash
end
parser(file).each {|k, v| puts "#{k}:\t#{v.join(', ')}"}

For one-liner lovers

ruby -e 'h={};File.read("text.txt").split("\n").map{|l|l.split(":", 2)}.map{|k, v|k.strip!;v.strip!; h[k] ? h[k] << v : h[k] = [v]};h.each {|k, v| puts "#{k}:\t#{v.join(", ")}"}'