Doing Things the Lazy Way: RSS for a Drawings Page


Let's do something really foolish! And have it work! What will you do? Posts Doing Things the Lazy Way: RSS for a Drawings Page 2024, October 9th @ 8:34 [disclaimer: this post was kind of hastily written and as such does not really go into proper detail for some things. rewrite pending?] My homepage now has RSS for its drawings section! You can find it over HERE. RSS! I will now proceed to write about the very stupid way I implemented an automatic RSS Feed. Incoherent text ahead. So the drawings page is just this long HTML file that I edit each time I add a new drawing. Each drawing is contained in its own figure element, which contains two captions and an image. (There'd later be a time element, but it wasn't there originally) Some Thinking How do we implement this? The general idea for making discrete pages and a feed for these would be to A) Use a HTML parser to read the page and an XML writer to write the feed (and a HTML writer to make the discrete pages) B) Just do some regex matching to rip out the details and concatenate strings (while replacing some strings) ... This is a tough choice. Thankfully, a detail with how I write these pages will push us toward the right direction. That direction is laziness, and it isn't right. Maybe. See, this site uses "Server Side Includes". They let you add some dynamic elements to an otherwise static HTML page. No, it varies. The "some" is perhaps an understatement, cause you can just drop the results of a script in there. As for myself, the only thing I am using here is the include directive. It essentially just lets you insert the contents of another file in another. You can nest them, but that's disabled in my server configuration. I use them to include certain chunks of HTML that I may want to change at some point, and don't want go to each page just to change it. There's about four of them on average on each page. Either way, what does this mean for us? Well... I don't feel like writing code that inserts these includes for when my HTML parsing script runs, so we will just not do that. Option B it is. At this point I am realizing that writing this post beyond its beginning is kind of pointless. But I will continue anyway. Let's actually do it! I'll be doing this in PowerShell, cause that's what ended up being the thing I have more than passing experience in. PowerShell actually has a HTML parser built-in, so it would've worked well for the HTML parsing. ...Until it got removed in version 7, anyway. I can't use an older version without extra setup, cause my server talks penguin. I guess they couldn't rely on Windows specific things when they went cross-platform. I briefly tried a module that read HTML, but there were some "my computer" related issues when testing locally. (It's pretty old at this point...) Let's figure out a regex pattern that'll work. We can't just do a simple `<figure>.+?</figure>`, because 1. Some figures have a class set. I use this to set figure/image dimensions. 2. ...We do need to actually get data from the figure, `<figure(?: class=".+?")?>.+?<figcaption>(.+?)<\/figcaption>.+?src="([^"]+)\.gif".+?<figcaption>(.+?)<\/figcaption>.+?<\/figure>` It ended up looking like that. Some additions would come later for timestamps and such. Actually, speaking of timestamps... Tedium of Time Did you know that adding timestamps to an entire backlog is kinda tedious? I wouldn't recommend it. This whole time, I did not note down when I made each drawing. This makes updating an RSS feed somewhat difficult, since those would like you to provide publishing dates. And pairing images with their on-disk create dates isn't quite the catch-all answer I would've liked it to be. Me drawing predates the drawings page. I started making drawings again because of drawing prompts a friend hosted. These would get uploaded onto a site. And the drawing tool I used (and still do use) is a webpage. As such, when I saved the drawings, they would just get dropped in my Downloads folder. I didn't bother organizing them into folders. (I actually still save them in there, but I do copy them to a different folder so I can process them for uploading) And now you know. I started out by going to the site I uploaded drawings to and taking their timestamps. These were in ISO format, so it wasn't too much trouble converting them into something I could paste. As for drawings made after 2022? I wasn't uploading them onto the site anymore, not most of them, anyway. For some reason... I took a directory listing of the server's drawings folder and used the timestamps there. You'd think that would've worked just fine... Except no. There have been times where I haven't actively updated the drawings page, instead ending up adding them in bulk later. So, some drawings would have incorrect timestamps. Yay. What now? ... Well, it wasn't too much trouble this time, (but it was still pretty tedious) as all the drawings that had incorrect timestamps were in the folder I copy drawings to. So it was just a matter of locating a file, and getting its creation time... Ah. Your average file browser generally doesn't include a way to copy a machine-readable timestamp to the clipboard. So, I had to view the file details, and convert the displayed times manually. Whee. At least after this, it was done. Let's get back on track now... Back on Track Now that we have a pattern, we can use it to extract data from the HTML and make a list. And then we can iterate on that list and do a lot of stuff. `foreach ($id in $ids){ # Take the drawing by its ID $entry = $drawings[$id] # Take some strings, substitute parts (title and description) and join them together. then write it $str = $part1.Replace("[[FIG1]]",$entry.Top).Replace("[[FIG2]]",$entry.Bottom)+$entry.Raw+$part2 $str \| Out-File ".\out\$($entry.ID).html" -Encoding utf8 }` This is the code that generates discrete pages. For reference, part1 and part2 here are two halves of a HTML document. And Raw is the outerHTML of the figure. Yes, I realize I could've just had one part and used the string replace method to insert the figure HTML, but it's (not) late for me to go back and change that. Too lazy. `$xml = "" foreach ($id in $lastten) { $entry = $drawings[$id] $item = $xml_part2 $item = $item.Replace("[[ID]]", $entry.ID) $item = $item.Replace("[[DATE]]", $entry.Date) $item = $item.Replace("[[TOP]]", (escstr $entry.Top)) $item = $item.Replace("[[BOTTOM]]", $entry.Bottom) $item = $item.Replace("[[SRC]]", $entry.Src) $xml += $item }` And here we have the code that generates the XML for each item. It's more "Replace this string" business, which was pretty simple. PowerShell does give you access to a XML writer through its dotnet-ness, but I found it slightly frustrating to use... So I just did the lazy method. One fellow might wonder "wait, what about CDATA?", to which I just say "ehh let's just avoid ]]>". Easy. The rest is just inserting this string of XML between two other pre-written parts of XML, which of oourse have had some keywords replaced with information as necessary. ... Is that it? One more thing The DSi Browser..? It can read RSS feeds? This website accepts HTTP requests. It only redirects to HTTPS if a certain header in the request is set. Modern browsers do that. Not old ones. Some old browsers on devices like game consoles may not support Let's Encrypt certificates (nicely), so access to HTTP is necessary. (Did you know that the browsers on both Wii and DSi support RSS feeds? You do have to manually visit them, but at least you can bookmark them.) Then, I realized. "What if you access a RSS feed from HTTP? What then?" Which turned out to be pretty important, actually. See, some readers on those older browsers might not be happy about seeing https links. So you will have to use http:// for the links. But then, what if you use a modern RSS reader? That adds some slight complexity like "unnecesary redirects" and other stuff. Gosh, what do I do... Thankfully, server configurations come to the rescue! I was able to configure my server in a way that it silently serves a different file if you try reading the feed depending on your used protocol. Pretty neat. I now generate two feed files, basing one on the other, just replacing the protocol for the other. Here's what I used for lighttpd, by the way: `$HTTP["scheme"] == "https" { $HTTP["host"] == "example.net" { # Prepend "_prefix_" to filename if trying to read xml file url.rewrite-once = ("^(.?)([^\/\.]+\.xml)$" => "$1_prefix_$2") } }` ... And that's all I had to write about? I think? Wow, I sure wrote a lot of words for such a bad implementation. If someone actually read the whole thing, I'd be surprised. I don't even know if anyone subscribes to these! Maybe I'll just stay blissfully unaware. Thank you for reading, maybe. PREV:* New Era, I suppose Drawings Item Status
Website by Perska. Content belongs to them unless otherwise stated. (In other words, (c) 2022-2024 Perska)

[disclaimer: this post was kind of hastily written and as such does not really go into proper detail for some things. rewrite pending?]

My homepage now has RSS for its drawings section!
You can find it over HERE.

RSS!

I will now proceed to write about the very stupid way I implemented an automatic RSS Feed. Incoherent text ahead.

So the drawings page is just this long HTML file that I edit each time I add a new drawing. Each drawing is contained in its own figure element, which contains two captions and an image. (There'd later be a time element, but it wasn't there originally)

Some Thinking

How do we implement this? The general idea for making discrete pages and a feed for these would be to

A) Use a HTML parser to read the page and an XML writer to write the feed (and a HTML writer to make the discrete pages)
B) Just do some regex matching to rip out the details and concatenate strings (while replacing some strings)

... This is a tough choice. Thankfully, a detail with how I write these pages will push us toward the right direction. That direction is laziness, and it isn't right. Maybe.

See, this site uses "Server Side Includes". They let you add some dynamic elements to an otherwise static HTML page. No, it varies. The "some" is perhaps an understatement, cause you can just drop the results of a script in there. As for myself, the only thing I am using here is the include directive. It essentially just lets you insert the contents of another file in another. You can nest them, but that's disabled in my server configuration.

I use them to include certain chunks of HTML that I may want to change at some point, and don't want go to each page just to change it. There's about four of them on average on each page. Either way, what does this mean for us?

Well... I don't feel like writing code that inserts these includes for when my HTML parsing script runs, so we will just not do that.
Option B it is. At this point I am realizing that writing this post beyond its beginning is kind of pointless. But I will continue anyway.

Let's actually do it!

I'll be doing this in PowerShell, cause that's what ended up being the thing I have more than passing experience in. PowerShell actually has a HTML parser built-in, so it would've worked well for the HTML parsing. ...Until it got removed in version 7, anyway. I can't use an older version without extra setup, cause my server talks penguin. I guess they couldn't rely on Windows specific things when they went cross-platform. I briefly tried a module that read HTML, but there were some "my computer" related issues when testing locally. (It's pretty old at this point...)

Let's figure out a regex pattern that'll work.
We can't just do a simple <figure>.+?</figure>, because

1. Some figures have a class set. I use this to set figure/image dimensions.
2. ...We do need to actually get data from the figure,

<figure(?: class=".+?")?>.+?<figcaption>(.+?)<\/figcaption>.+?src="([^"]+)\.gif".+?<figcaption>(.+?)<\/figcaption>.+?<\/figure>

It ended up looking like that. Some additions would come later for timestamps and such. Actually, speaking of timestamps...

Tedium of Time

Did you know that adding timestamps to an entire backlog is kinda tedious? I wouldn't recommend it.

This whole time, I did not note down when I made each drawing. This makes updating an RSS feed somewhat difficult, since those would like you to provide publishing dates. And pairing images with their on-disk create dates isn't quite the catch-all answer I would've liked it to be.

Me drawing predates the drawings page. I started making drawings again because of drawing prompts a friend hosted. These would get uploaded onto a site. And the drawing tool I used (and still do use) is a webpage. As such, when I saved the drawings, they would just get dropped in my Downloads folder. I didn't bother organizing them into folders. (I actually still save them in there, but I do copy them to a different folder so I can process them for uploading) And now you know.

I started out by going to the site I uploaded drawings to and taking their timestamps. These were in ISO format, so it wasn't too much trouble converting them into something I could paste. As for drawings made after 2022? I wasn't uploading them onto the site anymore, not most of them, anyway. For some reason...

I took a directory listing of the server's drawings folder and used the timestamps there. You'd think that would've worked just fine... Except no. There have been times where I haven't actively updated the drawings page, instead ending up adding them in bulk later. So, some drawings would have incorrect timestamps. Yay. What now?

... Well, it wasn't too much trouble this time, (but it was still pretty tedious) as all the drawings that had incorrect timestamps were in the folder I copy drawings to. So it was just a matter of locating a file, and getting its creation time... Ah.

Your average file browser generally doesn't include a way to copy a machine-readable timestamp to the clipboard. So, I had to view the file details, and convert the displayed times manually. Whee. At least after this, it was done. Let's get back on track now...

Back on Track

Now that we have a pattern, we can use it to extract data from the HTML and make a list. And then we can iterate on that list and do a lot of stuff.

foreach ($id in $ids){
	# Take the drawing by its ID
	$entry = $drawings[$id]
	# Take some strings, substitute parts (title and description) and join them together. then write it
	$str = $part1.Replace("[[FIG1]]",$entry.Top).Replace("[[FIG2]]",$entry.Bottom)+$entry.Raw+$part2
	$str | Out-File ".\out\$($entry.ID).html" -Encoding utf8
}

This is the code that generates discrete pages. For reference, part1 and part2 here are two halves of a HTML document. And Raw is the outerHTML of the figure. Yes, I realize I could've just had one part and used the string replace method to insert the figure HTML, but it's (not) late for me to go back and change that. Too lazy.

$xml = ""
foreach ($id in $lastten) {
	$entry = $drawings[$id]
	$item = $xml_part2
	$item = $item.Replace("[[ID]]", $entry.ID)
	$item = $item.Replace("[[DATE]]", $entry.Date)
	$item = $item.Replace("[[TOP]]", (escstr $entry.Top))
	$item = $item.Replace("[[BOTTOM]]", $entry.Bottom)
	$item = $item.Replace("[[SRC]]", $entry.Src)
	$xml += $item
}

And here we have the code that generates the XML for each item. It's more "Replace this string" business, which was pretty simple. PowerShell does give you access to a XML writer through its dotnet-ness, but I found it slightly frustrating to use... So I just did the lazy method.
One fellow might wonder "wait, what about CDATA?", to which I just say "ehh let's just avoid ]]>". Easy. The rest is just inserting this string of XML between two other pre-written parts of XML, which of oourse have had some keywords replaced with information as necessary.

... Is that it?

One more thing

An RSS feed opened in the DSi Browser, showing the drawings feed. The image from the top of the article is seen here. — The DSi Browser..?

This website accepts HTTP requests. It only redirects to HTTPS if a certain header in the request is set. Modern browsers do that. Not old ones. Some old browsers on devices like game consoles may not support Let's Encrypt certificates (nicely), so access to HTTP is necessary.
(Did you know that the browsers on both Wii and DSi support RSS feeds? You do have to manually visit them, but at least you can bookmark them.)

Then, I realized. "What if you access a RSS feed from HTTP? What then?" Which turned out to be pretty important, actually. See, some readers on those older browsers might not be happy about seeing https links. So you will have to use http:// for the links. But then, what if you use a modern RSS reader? That adds some slight complexity like "unnecesary redirects" and other stuff. Gosh, what do I do...

Thankfully, server configurations come to the rescue! I was able to configure my server in a way that it silently serves a different file if you try reading the feed depending on your used protocol. Pretty neat. I now generate two feed files, basing one on the other, just replacing the protocol for the other. Here's what I used for lighttpd, by the way:

$HTTP["scheme"] == "https" {
	$HTTP["host"] == "example.net" {
		# Prepend "_prefix_" to filename if trying to read xml file
		url.rewrite-once = ("^(.*?)([^\/\.]+\.xml)$" => "$1_prefix_$2")
	}
}

... And that's all I had to write about? I think? Wow, I sure wrote a lot of words for such a bad implementation. If someone actually read the whole thing, I'd be surprised. I don't even know if anyone subscribes to these! Maybe I'll just stay blissfully unaware.

Thank you for reading, maybe.