Generally speaking the best and easiest way for parsing HTML and XML is using Nokogiri library

  • To install Nokogiri

    gem install nokogiri


Here we'll use nokogiri to list our contents list from

Using CSS selectors

require 'nokogiri'
require 'open-uri'

page = Nokogiri::HTML(open(""))
page.css(".book .book-summary ul.summary li a, .book .book-summary ul.summary li span").each { |css| puts css.text.strip.squeeze.gsub("\n", '')}


Module 0x0 | Introduction
0.1.  Contribution
0.2.  Beginners
0.3.  Required Gems
1.  Module 0x1 | Basic Ruby Kung Fu
1.1.  String
1.1.1.  Conversion
1.1.2.  Extraction
1.2.  Array
2.  Module 0x2 | System Kung Fu
2.1.  Command Execution
2.2.  File manipulation
2.2.1.  Parsing HTML, XML, JSON
2.3.  Cryptography
2.4.  Remote Shell
2.4.1.  Ncat.rb
2.5.  VirusTotal
3.  Module 0x3 | Network Kung Fu
3.1.  Ruby Socket
3.2.  FTP
3.3.  SSH
3.4.  Email
3.4.1.  SMTP Enumeration
3.5.  Network Scanning


There are 2 ways we'd like to show here, the standard library rexml and nokogiri external library

We've the following XML file

<?xml version="1.0"?>
<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <description>Talk about a US-Japan war</description>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <description>A scientific fiction</description>
   <movie title="Trigun">
   <type>Anime, Action</type>
   <description>Vash the Stampede!</description>
<movie title="Ishtar">
   <description>Viewable boredom</description>


require 'rexml/document'
include REXML

file = "file.xml"
xmldoc =

# Get the root element
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]

# List of movie titles.
xmldoc.elements.each("collection/movie") do |e|
  puts "Movie Title : " + e.attributes["title"] 

# List of movie types.
xmldoc.elements.each("collection/movie/type") do |e|
  puts "Movie Type : " + e.text 

# List of movie description.
xmldoc.elements.each("collection/movie/description") do |e|
  puts "Movie Description : " + e.text

# List of movie stars
xmldoc.elements.each("collection/movie/stars") do |e|
  puts "Movie Stars : " + e.text


require 'nokogiri'


require 'nokogiri'
# Parse XML file
doc = Nokogiri::Slop file

puts"type").map {|f| t.text}        # List of Types
puts"format").map {|f| f.text}      # List of Formats
puts"year").map {|y| y.text}        # List of Year
puts"rating").map {|r| r.text}      # List of Rating
puts"stars").map {|s| s.text}       # List of Stars"description").map {|d| d.text}      # List of Descriptions


Assume you have a small vulnerability database in a json file like follows

      "name": "SQLi",
          "full_name": "SQL injection",
          "description": "An injection attack wherein an attacker can execute malicious SQL statements",
          "references": [
          "type": "web"

To parse it

require 'json'
vuln_json = JSON.parse('vulnerabilities.json'))

Returns a hash

     {"full_name"=>"SQL injection",
      "description"=>"An injection attack wherein an attacker can execute malicious SQL statements",
      "references"=>["", ""],

Now you can retrieve and data as you do with hash

vuln_json["Vulnerability"].each {|vuln| puts vuln['name']}

If you want to add to this database, just create a hash with the same struction.

xss = {"name"=>"XSS", "details:"=>{"full_name"=>"Corss Site Scripting", "description"=>" is a type of computer security vulnerability typically found in web applications", "references"=>["", ""], "type"=>"web"}}

You can convert it to json just by using `.to_json` method


Last updated