Alexander Dialog John 1993-11-13


		Date:	1993-11-13
		From:	John
		Subject:	Shoreline Space Storage

My Dear Mr. Duk:

Welcome back from your travels! I hope you will have an enjoyable and productive weekend.

I am making slow but steady progress in the quest for smoothing and the first screenshot of a smoothed coastline is almost within my grasp. However, the sheer size of the "shoreline space" has been a formidable challenge.

As you'll recall, OracleCard limits me to about a million square pixels, which gives me a map size of 50x70 coordinates at 18x14 pixels per coordinate. Several days ago I perfected the algorithm to precisely identify (and color) all the pixels that make up the shoreline area as well as the initial land sparks. Identifying these pixels was one thing, storing them in a way that permits random lookups was another. OracleCard limits its containers to 32000 characters, which means I have to divide my lookup maps into mulitple (and possibly numerous!) containers. What a pain!

I finally solved that problem a few hours ago and am now ready for the final assault. Your storage constraints in C++ are somewhat different, but still present a challenge (I presume). Have you decided how you are going to store the map of shoreline space needed by the (micro) continent generation? Perhaps you can apply some of the lessons I've learned (or at least better estimate the costs). I pondered three basic strategies for storing the shoreline space map. The storage costs (in bytes) are estimates for a 100x100 coordinate map at 18x14 pixels per coordinate:

The Full Map Approach
Cost: 2,520,000

The idea here is to use the display map as a lookup table. The full display consists of a 8 bit deep color window that is 1800 pixels wide and 1400 pixels high. To use this window as a lookup table you would first paint the pixels of the shoreline area a neutral gray. As the continent generation algorithm inspects the sparklist, it peeks at the actual color of the pixels to determine their state (land, sea, or not-yet-defined) and paints as it goes. This is the simplest approach but also the most costly (unless you are already storing this map in all its glory). Unfortunately, OracleCard does not let me use this approach. Do you store the whole map at once in your C program? Or do you summon it up on the fly as the user scrolls around?

The Seashore Blocks Approach
Cost: 750,000 + 20,000 for additonal index table

The second approach, and the one you seem to be gravitating towards from what I understand, is to store pixel details only for shoreline coordinates. Since about 30% of the coordinates in a typical map include shoreline pixels, this approach provides an immediate savings of 70%. However it does require some extra bookkeeping. How can the continent generation algorithm do a random x,y lookup when all the discontinuous coastline coordinates are smashed together into a single continuous mass?

The technique I used was to construct a separate shoreline map lookup table. This table has one element for each of the 10,000 coordinates of the macro-map. If the coordinate in question is inland or deepsea, the corresponding element is 0. If the coordinate is part of the shoreline area, the element is an index to the list of seashore coordinate blocks. The first seashore coordinate is assigned a 1, the second a 2, and so on up to the total number of seashore coordinates identified (about 3000). This map can be generated during the process of identifying the seashore pixels.

Given an absolute pixel location for a spark (say 1623,1230), the continent generation algorithm can lookup the current state of the eight neighboring pixels as follows. First divide by 18 horizontal and 14 vertical to obtain the macro coordinate in which this pixel is located: 91,88. Now lookup element 91,88 in the shoreline map and find the index number, say 2744. (No spark will ever have neighbors that fall within a non-seashore coordinate, so the index returned will never be 0; you still have to beware of pixels on the edge of the map, however.)

Now find coordinate pixel block number 2744 and calculate the local offset (1623-(91-1)*18,1230-(88-1)*14); in this case the offset is 3,12 and all 8 neighbors can be found in the same pixel block. When the pixel falls on the edge of a block you will need to return to the shoreline map to discover which block holds some of its neighboring pixels.

The Undiluted Pixels Approach
Cost: 300,000

Although the second approach does result in a 70% storage savings, it is still rather wasteful. It turns out that, on average, only about 40% of the pixels in a shoreline coordinate block are included in the original shoreline area; the remaining 60% is dead weight, which will never be examined or altered. The Undiluted Pixels Approach squeezes out the remaining fat but requires a different method to do the lookup. You will have to decide for yourself if the tradeoff is worthwhile. I was forced to partially implement this method.

The basic idea is to replace each sequence of non-shoreline pixels with a single value: the number of pixels in the sequence. For each of the 1400 rows of pixels in the full map, scan across and replace every set of n land/sea pixels with the number n. (If you wish to store this value as a single byte you can break it into multiple numbers which add up to n; 302 could be represented as a 255 followed by a 48).

The endpoints of each sequence are not included. Thus if a string of 115 sea pixels are followed by 6 undefined shoreline pixels, the line would be reduced to the number 114, followed by a '2' to represent the single ending sea spark, followed by 6 '0's to represent the six undefined pixels. 0 means undefined, 1 means land, 2 means sea, and 3 or higher means a sequence of length n. (A sequence of 2 non-shoreline pixels should be represented as either '11' or '22'). The shoreline space, then, is represented by 1400 strings of varying length (typically about 220 bytes).

The lookup procedure does not require the shoreline map needed by approach two; we are dealing entirely on the pixel level. To lookup a spark at <1623,1230> we first find string number 1230. Our problem is then to figure out which byte in the string corresponds to 1623. We determine this by starting at the beginning of the string and counting. A 0,1, or 2 byte adds 1 to our count; anything higher adds that value to our count. We keep counting until we reach 1623. The sparks we need to lookup are guaranteed never to fall within one of the reduced sequences; the byte corresponding to 1623 will be a 0, 1, or 2, never anything higher; neighboring pixels which fall within a reduced sequence have, by definition, already been assigned and thus are never altered or added to the sparklist.

To lookup all eight neighbors we will need to scan through 3 strings (e.g. pixels 1622-1624 in strings 1229-1231). For added efficiency we could start at the end of a string and count down from 1800 whenever the horizontal value is greater than 900; that way we never have to count past the center of a string. A little more effort is required constructing this 1400 string shoreline space map, but the lookup procedure may be somewhat easier to implement. I'm not sure what the speed cost would be in C; it might not be that bad.

As I mentioned, I was essentially forced into using a hybrid of aproaches two and three. I use the shoreline map from approach two to locate the appropriate coordinate block, but I reduce the block itself by stripping non-seashore pixels from the end of each of the 13 lines in the block (it's 13 and not 14 because I use the 14th line (and 18th column) to form gridlines on the display map). This provided me with an additional savings of about 30% which was just enough to greatly simplify my division into multiple containers.

I hope this little lecture on storage alternatives has not put you to sleep. I hope to achieve actual smoothing sometime soon. Let me know what strategy you pick.

Yours in haste,

Epicurious