Parsing HTML, XML, JSON
Generally speaking the best and easiest way for parsing HTML and XML is using Nokogiri library
  • To install Nokogiri
    1
    gem install nokogiri
    Copied!

HTML

Here we'll use nokogiri to list our contents list from http://rubyfu.net/content/

Using CSS selectors

1
require 'nokogiri'
2
require 'open-uri'
3
4
page = Nokogiri::HTML(open("http://rubyfu.net/content/"))
5
page.css(".book .book-summary ul.summary li a, .book .book-summary ul.summary li span").each { |css| puts css.text.strip.squeeze.gsub("\n", '')}
Copied!
Returns
1
RubyFu
2
Module 0x0 | Introduction
3
0.1. Contribution
4
0.2. Beginners
5
0.3. Required Gems
6
1. Module 0x1 | Basic Ruby Kung Fu
7
1.1. String
8
1.1.1. Conversion
9
1.1.2. Extraction
10
1.2. Array
11
2. Module 0x2 | System Kung Fu
12
2.1. Command Execution
13
2.2. File manipulation
14
2.2.1. Parsing HTML, XML, JSON
15
2.3. Cryptography
16
2.4. Remote Shell
17
2.4.1. Ncat.rb
18
2.5. VirusTotal
19
3. Module 0x3 | Network Kung Fu
20
3.1. Ruby Socket
21
3.2. FTP
22
3.3. SSH
23
3.4. Email
24
3.4.1. SMTP Enumeration
25
3.5. Network Scanning
26
.
27
.
28
..snippet..
Copied!

XML

There are 2 ways we'd like to show here, the standard library rexml and nokogiri external library
We've the following XML file
1
<?xml version="1.0"?>
2
<collection shelf="New Arrivals">
3
<movie title="Enemy Behind">
4
<type>War, Thriller</type>
5
<format>DVD</format>
6
<year>2003</year>
7
<rating>PG</rating>
8
<stars>10</stars>
9
<description>Talk about a US-Japan war</description>
10
</movie>
11
<movie title="Transformers">
12
<type>Anime, Science Fiction</type>
13
<format>DVD</format>
14
<year>1989</year>
15
<rating>R</rating>
16
<stars>8</stars>
17
<description>A scientific fiction</description>
18
</movie>
19
<movie title="Trigun">
20
<type>Anime, Action</type>
21
<format>DVD</format>
22
<episodes>4</episodes>
23
<rating>PG</rating>
24
<stars>10</stars>
25
<description>Vash the Stampede!</description>
26
</movie>
27
<movie title="Ishtar">
28
<type>Comedy</type>
29
<format>VHS</format>
30
<rating>PG</rating>
31
<stars>2</stars>
32
<description>Viewable boredom</description>
33
</movie>
34
</collection>
Copied!

REXML

1
require 'rexml/document'
2
include REXML
3
4
file = File.read "file.xml"
5
xmldoc = Document.new(xmlfile)
6
7
# Get the root element
8
root = xmldoc.root
9
puts "Root element : " + root.attributes["shelf"]
10
11
12
# List of movie titles.
13
xmldoc.elements.each("collection/movie") do |e|
14
puts "Movie Title : " + e.attributes["title"]
15
end
16
17
# List of movie types.
18
xmldoc.elements.each("collection/movie/type") do |e|
19
puts "Movie Type : " + e.text
20
end
21
22
# List of movie description.
23
xmldoc.elements.each("collection/movie/description") do |e|
24
puts "Movie Description : " + e.text
25
end
26
27
# List of movie stars
28
xmldoc.elements.each("collection/movie/stars") do |e|
29
puts "Movie Stars : " + e.text
30
end
Copied!

Nokogiri

1
require 'nokogiri'
Copied!

Slop

1
require 'nokogiri'
2
# Parse XML file
3
doc = Nokogiri::Slop file
4
5
puts doc.search("type").map {|f| t.text} # List of Types
6
puts doc.search("format").map {|f| f.text} # List of Formats
7
puts doc.search("year").map {|y| y.text} # List of Year
8
puts doc.search("rating").map {|r| r.text} # List of Rating
9
puts doc.search("stars").map {|s| s.text} # List of Stars
10
doc.search("description").map {|d| d.text} # List of Descriptions
Copied!

JSON

Assume you have a small vulnerability database in a json file like follows
1
{
2
"Vulnerability":
3
[
4
{
5
"name": "SQLi",
6
"details:":
7
{
8
"full_name": "SQL injection",
9
"description": "An injection attack wherein an attacker can execute malicious SQL statements",
10
"references": [
11
"https://www.owasp.org/index.php/SQL_Injection",
12
"https://cwe.mitre.org/data/definitions/89.html"
13
],
14
"type": "web"
15
}
16
}
17
]
18
}
Copied!
To parse it
1
require 'json'
2
vuln_json = JSON.parse(File.read('vulnerabilities.json'))
Copied!
Returns a hash
1
{"Vulnerability"=>`
2
[{"name"=>"SQLi",
3
"details:"=>
4
{"full_name"=>"SQL injection",
5
"description"=>"An injection attack wherein an attacker can execute malicious SQL statements",
6
"references"=>["https://www.owasp.org/index.php/SQL_Injection", "https://cwe.mitre.org/data/definitions/89.html"],
7
"type"=>"web"}}]}
Copied!
Now you can retrieve and data as you do with hash
1
vuln_json["Vulnerability"].each {|vuln| puts vuln['name']}
Copied!
If you want to add to this database, just create a hash with the same struction.
1
xss = {"name"=>"XSS", "details:"=>{"full_name"=>"Corss Site Scripting", "description"=>" is a type of computer security vulnerability typically found in web applications", "references"=>["https://www.owasp.org/index.php/Cross-site_Scripting_(XSS)", "https://cwe.mitre.org/data/definitions/79.html"], "type"=>"web"}}
Copied!
You can convert it to json just by using `.to_json` method
1
xss.to_json
Copied!
Last modified 3yr ago