Successfully using OCRA with Watir Using a Class as an Array in Ruby
Mar 21

In my line of work I end up trying to move a large amount of text from web pages into things like Word and Excel. Getting the Text from the Web Browser is easy…Putting the text with formatting into an Office app could be a lot of work trying to parse through all the HTML and send the equivalent formatting commands with the text. Fortunately, Word and Excel had the ability to paste from the clipboard HTML and render it with the correct formatting! …Only problem is that for some reason, no one added HTML support in the Win32::Clipboard gem?!? (At least not in the 1.8.x version of Ruby, which I use.) So I spent a few hours looking over the existing clipboard.rb file, found a VBA example of an HTML Copy, and came up with the following code:

# htmlclipboard.rb
module Win32
 class Clipboard

   @cfHTML = nil

   def self.registerHTML
     if not @cfHTML
       @cfHTML = Win32::Clipboard.RegisterClipboardFormat("HTML Format")

   --  Takes a number and formats with zero's in front.
       1234 => "0000001234"
       23   => "0000000023"
   def self.zeroFmt(nbr)
     zeros = "0"*10
     len = nbr.to_s.length        # this is the number of digits in the number  2784 => 4
     return zeros.slice(0..(zeros.length - len - 1)) + nbr.to_s

   def self.set_html_data(clip_data)
     sDataStart  = "<HTML><BODY>\n<!--StartFragment -->"
     sDataEnd = "<!--EndFragment -->\n</BODY></HTML>"
     sData = ""
     # Clipboard Header:   looks like ->
     # Version: 1.0
     #    StartHTML:0000000000
     #    EndHTML:0000000000
     #    StartFragment:0000000000
     #    EndFragment:0000000000
     sDataHdr = "Version:1.0\r\nStartHTML:aaaaaaaaaa\r\nEndHTML:bbbbbbbbbb\r\n"
     sDataHdr += "StartFragment:cccccccccc\r\nEndFragment:dddddddddd\r\n"
     sData = sDataHdr + sDataStart + clip_data + sDataEnd

     sData = sData.gsub(/aaaaaaaaaa/) { zeroFmt(sDataHdr.length) }
     sData = sData.gsub(/bbbbbbbbbb/) { zeroFmt(sData.length) }
     sData = sData.gsub(/cccccccccc/) { zeroFmt(sDataHdr.length + sDataStart.length) }
     sData = sData.gsub(/dddddddddd/) { zeroFmt(sDataHdr.length + sDataStart.length + clip_data.length) }

     if not @cfHTML
     hmem = GlobalAlloc(GHND, sData.length + 10)
     mem  = GlobalLock(hmem)
     memcpy(mem, sData, sData.length)
       if SetClipboardData(@cfHTML, hmem) == 0
         raise Error, "SetClipboardData() failed; " + get_last_error

Adding support for HTML was pretty trivial. The trickiest part is that the clipboard header has fields that require byte offsets that include the header…so you have to use place holder values for the offset numbers, then go back and figure out what the numbers really are and substitute them back in. This code extends the existing Win32::Clipboard class, so it can reference existing functions and includes from that class. The actual Copy-to-Clipboard lines of the code are a straight copy from the existing Win32::Clipboard set_data method.
Once you have this code, you use it like so:

require 'win32/clipboard'
require 'htmlclipboard.rb'
require 'win32ole'

class WordConst
   #Empty class to hold Word Constants
msword ='Word.Application')
WIN32OLE.const_load(msword, WordConst)
word = msword.Document.Open("MyWork.doc")
html = "<HTML><BODY><h1>This is a test of the HTML Clipboard code!</h1>How did it work?</BODY></HTML>"

Make sure the htmlclipboard.rb file is called after the win32/clipboard or you will have a number of errors! In this example, the h1 line will paste into Word using the ‘H1′ style, and the following text will paste into Word using the ‘Normal’ style. Since we are only registering one type of data(”HTML Format”) in the Clipboard, the WdPasteDefault parameter to the PasteAndFormat method can be used to paste in the HTML. If we had registered multiply types, then we would need to be more specific.
Have Fun!

SociBook Digg Facebook Google Yahoo Buzz StumbleUpon

Leave a Reply