In my line of work I end up trying to move a large amount of text from web pages into things like Word and Excel. Getting the Text from the Web Browser is easy…Putting the text with formatting into an Office app could be a lot of work trying to parse through all the HTML and send the equivalent formatting commands with the text. Fortunately, Word and Excel had the ability to paste from the clipboard HTML and render it with the correct formatting! …Only problem is that for some reason, no one added HTML support in the Win32::Clipboard gem?!? (At least not in the 1.8.x version of Ruby, which I use.) So I spent a few hours looking over the existing clipboard.rb file, found a VBA example of an HTML Copy, and came up with the following code:
#
# htmlclipboard.rb
#
module Win32
class Clipboard
@cfHTML = nil
def self.registerHTML
if not @cfHTML
@cfHTML = Win32::Clipboard.RegisterClipboardFormat("HTML Format")
end
end
=begin
zeroFmt(nbr)
-- Takes a number and formats with zero's in front.
1234 => "0000001234"
23 => "0000000023"
=end
def self.zeroFmt(nbr)
zeros = "0"*10
len = nbr.to_s.length # this is the number of digits in the number 2784 => 4
return zeros.slice(0..(zeros.length - len - 1)) + nbr.to_s
end
def self.set_html_data(clip_data)
sDataStart = "<HTML><BODY>\n<!--StartFragment -->"
sDataEnd = "<!--EndFragment -->\n</BODY></HTML>"
sData = ""
#
# Clipboard Header: looks like ->
# Version: 1.0
# StartHTML:0000000000
# EndHTML:0000000000
# StartFragment:0000000000
# EndFragment:0000000000
sDataHdr = "Version:1.0\r\nStartHTML:aaaaaaaaaa\r\nEndHTML:bbbbbbbbbb\r\n"
sDataHdr += "StartFragment:cccccccccc\r\nEndFragment:dddddddddd\r\n"
sData = sDataHdr + sDataStart + clip_data + sDataEnd
sData = sData.gsub(/aaaaaaaaaa/) { zeroFmt(sDataHdr.length) }
sData = sData.gsub(/bbbbbbbbbb/) { zeroFmt(sData.length) }
sData = sData.gsub(/cccccccccc/) { zeroFmt(sDataHdr.length + sDataStart.length) }
sData = sData.gsub(/dddddddddd/) { zeroFmt(sDataHdr.length + sDataStart.length + clip_data.length) }
if not @cfHTML
self.registerHTML
end
self.open
EmptyClipboard()
hmem = GlobalAlloc(GHND, sData.length + 10)
mem = GlobalLock(hmem)
memcpy(mem, sData, sData.length)
begin
if SetClipboardData(@cfHTML, hmem) == 0
raise Error, "SetClipboardData() failed; " + get_last_error
end
ensure
GlobalFree(hmem)
self.close
end
self
end
end
end
Adding support for HTML was pretty trivial. The trickiest part is that the clipboard header has fields that require byte offsets that include the header…so you have to use place holder values for the offset numbers, then go back and figure out what the numbers really are and substitute them back in. This code extends the existing Win32::Clipboard class, so it can reference existing functions and includes from that class. The actual Copy-to-Clipboard lines of the code are a straight copy from the existing Win32::Clipboard set_data method.
Once you have this code, you use it like so:
...
require 'win32/clipboard'
require 'htmlclipboard.rb'
require 'win32ole'
class WordConst
#Empty class to hold Word Constants
end
msword = WIN32OLE.new('Word.Application')
WIN32OLE.const_load(msword, WordConst)
word = msword.Document.Open("MyWork.doc")
...
html = "<HTML><BODY><h1>This is a test of the HTML Clipboard code!</h1>How did it work?</BODY></HTML>"
Win32::Clipboard.set_html_data(html)
word.Selection.PasteAndFormat(WordConst::WdPasteDefault)
Make sure the htmlclipboard.rb file is called after the win32/clipboard or you will have a number of errors! In this example, the h1 line will paste into Word using the ‘H1′ style, and the following text will paste into Word using the ‘Normal’ style. Since we are only registering one type of data(”HTML Format”) in the Clipboard, the WdPasteDefault parameter to the PasteAndFormat method can be used to paste in the HTML. If we had registered multiply types, then we would need to be more specific.
Have Fun!






