Friday, June 21, 2013

Extract Urls from a remote webpage using PHP

Scraping data from website is extremely popular now a days. I have written a simple website parser class to grab all the urls from a website. Shared the class below for all to see and fun.

We will use the parser class below to extract all image sources and hyper links from a website.
Uses:
Create an instance of WebsiteParser class with a website url to get all the urls from their. And, then call getHrefLinks() and getImageSources() method like below to extract hyper links and image sources respectively.

View Demo :: Try it out and rate on phpclasses.org



Wednesday, May 29, 2013

ActionView::Template::Error Couldn't find file 'jquery-ui' with Rails 3.2

Today, I have start working after 6 months with one of my existing Ruby on Rails application and found the mentioned error. And then found, jquery-rails removed the jQuery UI and recommended to use jquery-ui-rails or downgrade the gem version.

To use latest version:
Add jQuery UI in to the application.js:
//= require jquery.ui.all
Add jQuery UI CSS to application.css:
*= require jquery.ui.all

To downgrade the jquery-rails:
Lock the version in to an older version. My problem has been solved with 2.0.2
gem "jquery-rails", "2.0.2"

Saturday, March 2, 2013

SMTP, IMAP and POP server settings

GMAIL POP SMTP IMAP
Host pop.gmail.com smtp.gmail.com imap.gmail.com
Port 995 25 993
SSL Required Yes Yes Yes
YAHOO POP SMTP IMAP
Host pop.mail.yahoo.com smtp.mail.yahoo.com NA
Port 110 25 NA
SSL Required Yes Yes NA
YAHOO PLUS POP SMTP IMAP
Host plus.pop.mail.yahoo.com plus.smtp.mail.yahoo.com NA
Port 995 465 NA
SSL Required Yes Yes NA
Windows Live POP SMTP IMAP
Host pop3.live.com smtp.live.com NA
Port 995 25/587 NA
SSL Required Yes Yes NA
AOL POP SMTP IMAP
Host pop.aol.com smtp.aol.com imap.aol.com
Port 995 587/465 993
SSL Required Yes Yes Yes

Sunday, December 16, 2012

Uninitialized constant ActiveSupport::Dependencies::Mutex (NameError)

This error may occur to run older Rails application for conflicting with rubygems version. After a long googling I've come out in to following three solutions:

Using Bundler.:
Click here if you are interested to see the solution.

Changing rubygems version:
  • Downgrade the rubygems to an earlier version using gem update --system {version}
  • Add "require 'thread'" in to - Rakefile, script/server and config/environment.rb

Upgrade Rails:
Upgrade application in to Rails 3 though it's very hard to do for existing/running rails application.

For me, I was trying to run Rails 2.3.5 application where my system had rubygems 1.8.2. I've changed the rubygems version in to 1.7.2 as mentioned above and everything works fine.

Wednesday, December 12, 2012

Render HTML file in Rails using Nokogiri

You may need to render static HTML template file from a controller action without modifying anything in to file. And, may need to replace some content during render.

A template may contain relative path for css, image etc. which may raise an exception as rails may not route that automatically. At that case you have to change the contents but that's not straightforward.

I have given an example below to render a static newsletter template placed in to public directory

Say, you have a newsletter template in to public/newsletters/1.html which contains something like below and want to render from newsletters#show:
    
    

    
{title}
{description}

Write a library to process newsletter template and placed in to lib
require 'nokogiri'
require 'uri'

module NewsletterProcessor
  TEMPLATE_DIR = "newsletter-template"
  TEMPLATE_PATH = "public/#{TEMPLATE_DIR}/1.html"

  def self.included(base)
    base.send(:before_filter, :load_newsletter, :only => :show)
  end

  def show
    content = render_to_string(:file => TEMPLATE_PATH)
    @content = TemplateParser.new(request, @newsletter.attributes).parse(content)
    render :text => @content, :layout => false
  end

  private

  def load_newsletter
    @newsletter = Newsletter.find(params[:id])
  end

  class TemplateParser
    cattr_accessor :request
    cattr_accessor :params

    def initialize(request, attrs)
      self.request = request
      self.params = attrs
    end

    def parse(content)
      content = Nokogiri::HTML(content)
      return parse_content(parse_assets_url(content))
    end

    private

    def parse_assets_url(content)
      #Parse Image URL
      content.css("img").each do |img|
        update_asset_attribute(img, "src")
      end

      #Parse Stylesheet URL
      content.css("link").each do |link|
        update_asset_attribute(link, "href")
      end

      return content
    end

    def parse_content(content)
      content = content.to_s
      content.gsub!(/\{(.*)\}/) do |exp|
        replace_attribute_value(exp)
      end

      return content
    end

    def replace_attribute_value(exp, attributes = self.params)
      key = exp.delete('{}').downcase.to_sym
      attributes.has_key?(key) ? attributes[key] : exp
    end

    def update_asset_attribute(ele, key)
      path = ele.attributes[key].value
      ele.attributes[key].value = full_url(path)
    end

    def full_url(path)
      url = URI.parse(path)
      return !url.scheme ? "#{self.request.scheme}://#{self.request.host_with_port}/#{TEMPLATE_DIR}/#{url.path}" : path
    end
  end
end

Now, just include the library in to the newsletters controller and run newsletters#show from browser.
class NewslettersController < ApplicationController
  include NewsletterProcessor
end
That's all!