OdeToCode IC Logo

MetaBlog into a Local Directory

Friday, March 9, 2007

Microsoft Word 2007 produces relatively clean HTML when you use the Publish feature to create a blog post. Although the XHTML purist will still be unhappy with anything they don't write themselves, the HTML far surpasses anything we've seen from Word in previous versions. Unfortunately, this feature is only available for blog posting, as far as I can tell. The "Web Page" and "Web Page, Filtered" options in the "Save As" menu still produce the same .mso littered HTML that makes Word impossible to use as a serious HTML editor. I'd like to use the HTML output by the Publish feature for purposes other than blogging.

I wasn't sure how to get to this Publish feature, but after looking at the MetaBlog API that Word can consume, I decided it wouldn't be too hard to write something in ASP.NET that would run on localhost and give me exactly what I wanted from Word. Specifically – convert a document into clean HTML and PNG graphics and drop the files into a local directory. I decided this job was even easier when I snooped around the SubText subversion repository and discovered that Cook Computing's XML-RPC library does all the heavy lifting and XML parsing.

It took a bit of debugging, but the IMetaWeblog interface defined in the XML-RPC library (interfaces\MetaWeblogAPI.cs) needs a few tweaks to work with Word. First, Word invokes a blogger.getUserBlogs method that isn't defined in the interface, but is easy to add:

[XmlRpcMethod("blogger.getUsersBlogs", Description = "...")]
BlogInfo[] getUsersBlogs(string blogid, string username, string password);


Secondly, Word appear to pass an integer for the blogid parameter of the newMediaObject method. The service expects a string. I don't know enough about the history of the MetaBlog API to know who is wrong in this scenario, but it's easy to fix the method definition in the interface.

Description =
"Makes a new file to a designated blog using the "
              + "metaWeblog API. Returns url as a string of a struct.")]
MediaObjectInfo newMediaObject(
int blogid, // this was a string, but that doesn't work with Word...
      string username, string password, FileData file);


One last change is to Refactor -> Rename the UrlInfo struct in MetaWeblogAPI.cs to MediaObjectInfo. The rename allows Word and the MetaBlog service to agree on the name of the struct.

public struct MediaObjectInfo // this used to be called UrlInfo
public string url;


Once all this is done it's a simple matter to implement that interface in an HttpHandler (ashx file).

public class MetaWebLogging : XmlRpcService, IMetaWeblog
// ...


Each method needs an implementation. For my workflow, I'm moving files around on the hard drive, but here is a sample implementation for the newPost method that will dump the incoming HTML into a file in the root directory of the application.

public string newPost(string blogid, string username, string password,
Post post, bool publish)
string fileName = Path.Combine(
                        post.title +
using (FileStream fs = File.OpenWrite(fileName))
using (StreamWriter writer = new StreamWriter(fs))

return Path.GetFileName(fileName);


Now I just point Word 2007's Publish feature to my local metablog.ashx file and export documents as HTML. For what I needed to do, this little hack was a huge time saver. Hopefully, future versions of Word will make this even easier.

Haacked Friday, March 9, 2007
I'm pretty sure that Word is wrong. That method of MetaWeblogAPI is based on the Blogger API (www.blogger.com/.../xmlrpc_getUsersBlogs.html), which takes in a string as the blog id.

This makes sense because some blogs might use a GUID or some other value other than an int.
scott Friday, March 9, 2007
I was thinking that was a bit odd of Word. I wonder if it is a bug?
Sahil Malik Friday, March 9, 2007
Hey Scott -

SharePoint 2007 has a document conversions feature where you upload a word doc, and it fires up a workflow and converts & publishes an HTML version of the word doc. That way, you don't have to write ANY code :).

That's 1 step lazier than even this. LOL

Rick Strahl Friday, March 9, 2007
Hey Scott, any chance you can post this code?

In fact I was just yesterday looking at MetaWebLogApi myself to move into my blog hadn't gotten around to it and like my first stop was SubText <s>. Still haven't moved forward with this so any 'standalone' application would help...
Rick Strahl Saturday, March 10, 2007
Scott... I had some spare time tonight so I implemented MetaBlogApi on my blog and while at it checked out the Word issue as well.

Oddly for getCategories (which also receives the blogId parm) the value is passed as a string. Somebody was sleeping on the job...

For newPost another solution is to cast the blogId to an object value in the Interface and method. This lets the same interface work with both string and int values - in both cases it ends up as a string in the code. Actually any of the methods that pass a BlogId would have to be replaced with object (mediaObject in particular).
Comments are closed.