nanogui: Thread: propose low-level line draw?

Subject: propose low-level line draw?
From: Kaben Nanlohy ####@####.####
Date: 18 Dec 2000 21:56:28 -0000
Message-Id: <Pine.NEB.4.21.0012181329430.14684-100000@kaben.frye.com>

I've been doing lots of real-time graphing of the results of signal
processing, and I've found that the bottleneck in my applications is the
high-level linedraw routine of nano-X.

GdLine() handles bresenham, and calls GdClipPoint() and psd->DrawPixel()
for each pixel.  It's the stack setup and takedown that's eating processor
time.

In the source for GdLine() is the comment:

/*
 * For size considerations, there's no low-level bresenham line draw, so
 * we've got to draw all non-vertical and non-horizontal lines with
 * per-point clipping for the time being.
 */

Does "for the time being" mean that someone has ideas for a faster
linedraw that either replaces per-point clipping with something more
efficient, or that makes use of low-level bresenham in the driver?

I've been giving thought to such a low-level driver routine, and it seems
to me that the center of the problem is starting bresenham at the start of
a clipping region instead of the start of the line, while still drawing
all points of the line at the same positions as are drawn by the current
GdLine() function.

All of the state in bresenham is carried pixel-to-pixel in the decision
variable "rem" in GdLine(), in the slope variables "xdelta" and "ydelta",
and in the oordinate variables "x1" and "y1".  GdLine() could, at least
for rectangular clipping regions, prepare those five state variables
after clipping, and pass them along with "x2" and "y2" (as the other
clipping bounds) and "psd" and "line_color" to a low-level linedraw once
for each clipped line segment.

***

Someone with experience in optimizing graphics engines has probably
thought of all of this before, and with respect to nano-X.  Since I don't
have that kind of experience, I'd like to know what other people have
thought along these lines.  In particular would this set-up per line
segment waste more time than would be saved, are these ideas counter to
the design of the drivers, etc.

***

Thanks much, merry xmas -- Kaben

Subject: Re: propose low-level line draw?
From: "Greg Haerr" ####@####.####
Date: 18 Dec 2000 22:12:05 -0000
Message-Id: <001901c06940$b5023ca0$6817dbd0@censoft.com>

: GdLine() handles bresenham, and calls GdClipPoint() and psd->DrawPixel()
: for each pixel.  It's the stack setup and takedown that's eating processor
: time.

Make sure you #define NDEBUG in the fblinX.c driver for final
production code, it will run without any of the assert()'s.  What
bpp are you running?


: Does "for the time being" mean that someone has ideas for a faster
: linedraw that either replaces per-point clipping with something more
: efficient, or that makes use of low-level bresenham in the driver?
:
: I've been giving thought to such a low-level driver routine, and it seems
: to me that the center of the problem is starting bresenham at the start of
: a clipping region instead of the start of the line, while still drawing
: all points of the line at the same positions as are drawn by the current
: GdLine() function.

Likely, your problem is that the bresenham drawing is too slow,
even when drawing to an unobscured window.  If this is the case,
then when GdClipArea returns VISIBLE, we should go directly to
a driver-level routine that draws w/o clipping.  This will remove
almost all the setup/takedown time that you're seeing hurt performance.
It's unlikely that recoding the upper level routine as you suggest below
will help much, since usually you'll have one clip rectangle - your
window border.



: Someone with experience in optimizing graphics engines has probably
: thought of all of this before, and with respect to nano-X.  Since I don't
: have that kind of experience, I'd like to know what other people have
: thought along these lines.  In particular would this set-up per line
: segment waste more time than would be saved, are these ideas counter to
: the design of the drivers, etc.

I have been working on just getting Microwindows basic draw capabilities
running, rather than getting it optimized.  The plan is to add either a
direct
low-level screen driver entry point or an upper-level non-clipped routine
for speed when needed.  The drawback to adding another screen driver
entry point is that then everybody needs to write the routine in order to
bring up Microwindows on a new platform.  I've purposely kept as
many routines out as possible - because I want Microwindows to
be easy to port and easy to understand...

Regards,

Greg

Subject: Re: propose low-level line draw?
From: Kaben Nanlohy ####@####.####
Date: 18 Dec 2000 23:22:49 -0000
Message-Id: <Pine.NEB.4.21.0012181450090.14954-100000@kaben.frye.com>

On Mon, 18 Dec 2000, Greg Haerr wrote:
> I have been working on just getting Microwindows basic draw
> capabilities running, rather than getting it optimized.  The plan is
> to add either a direct low-level screen driver entry point or an
> upper-level non-clipped routine for speed when needed.  The drawback
> to adding another screen driver entry point is that then everybody
> needs to write the routine in order to bring up Microwindows on a new
> platform.  I've purposely kept as many routines out as possible -
> because I want Microwindows to be easy to port and easy to
> understand...

How about putting a "gen_linedraw()" into genmem.c, where gen_linedraw()
draws pixel-by-pixel using psd->drawpixel()?  Then set_subdriver() or
select_fb_subdriver() can fill-in a low-level linedraw with something that
works in all cases in which psd->drawpixel() works, and where an optimized
psd->linedraw() routine hasn't yet been written.

This still requires hoop-jumping for clipping, tho.  Unless as a special
case clipping isn't needed.

> Make sure you #define NDEBUG in the fblinX.c driver for final
> production code, it will run without any of the assert()'s.  What
> bpp are you running?

4bpp.

> Likely, your problem is that the bresenham drawing is too slow,
> even when drawing to an unobscured window.  If this is the case,
> then when GdClipArea returns VISIBLE, we should go directly to
> a driver-level routine that draws w/o clipping.  This will remove
> almost all the setup/takedown time that you're seeing hurt performance.
> It's unlikely that recoding the upper level routine as you suggest below
> will help much, since usually you'll have one clip rectangle - your
> window border.

I'm double-buffering graphs at about eight frames per second, using a
linear4_blit() that has been messily optimized for double-word copies
except at boundaries of the rectangle.

I blit from a portion of the offscreen pixmap to the display window.  As
long as my pixmap has more height than the range of my graphs, clipped
linedraws to the pixmap are not required.

For my purposes, at least, calling an unclipped psd->linedraw() from
GdLine() would be wonderful.

***

As soon as I have something that works, I'll post it to the list.

Thanks -- Kaben Nanlohy

Subject: Re: propose low-level line draw?
From: Morten Rolland ####@####.####
Date: 19 Dec 2000 09:07:49 -0000
Message-Id: <3A3F1B08.2344D5EA@screenmedia.no>

Kaben Nanlohy wrote:
> 
> How about putting a "gen_linedraw()" into genmem.c, where gen_linedraw()
> draws pixel-by-pixel using psd->drawpixel()?  Then set_subdriver() or
> select_fb_subdriver() can fill-in a low-level linedraw with something that
> works in all cases in which psd->drawpixel() works, and where an optimized
> psd->linedraw() routine hasn't yet been written.
>

This is what should ultimately be done for all the drawing functions in
the low level driver that are more complex than get/set pixel (and maybe
hline and some sort of scanline for speeding up unoptimized GdArea/Image).
The benefits would be:

   1) The general functions would be regarded as the definition of a
      certain low level drawing operation and would help people
      understand the low level code and be a guide when writing
      optimized versions.  They could also be used to check the operation
      of an optimized version in a regression test setup.

   2) Move all the drawing stuff out of engine/devdraw.c - the way it is
      now is not exactly beautiful -- e.g. GrArea calls a low level
      driver to paint rectangles at a time (for clipping) when 16 bit
      colours is used (our setup), and falling through to a more general
      but much more slow version for other colour depths.
      All code in engine/devdraw.c that changes pixel values should be
      moved into a correspondingly (possibly new) low level operation,
      and care should be taken to make it possible to do clipping
      efficiently (e.g. big areas and multiple pixles at a time).

   3) Make it a requirement that all new drawing primitives be implemented
      as general low level drawing functions first so that their operation
      can be discussed and tested on all platforms and bit depths.
      This will greatly improve the tangled way of introducing new
      features into this part of the system.

Note: I haven't looked much at the code in the most recent versions.

> This still requires hoop-jumping for clipping, tho.  Unless as a special
> case clipping isn't needed.

Yes... Your situation is much similar to ours when we optimized the GdArea
function for 16 bit colours.  We implemented a low level painter that was
sufficiently flexible to paint only parts of the original image, thus
making it possible to walk the clipping rectangles for the area to be
painted and call the low level driver for the visible parts only.

This is easier for GdArea type of functions than linedrawing like you
explain, since it is important in which order you walk the rectangles,
and also not always knowing which rectangle will be next until after
a (possibly hidden) low level linedraw has been performed to update the
state of the "line draw".

For the low level GdArea we used a "hw_gc_t" struct to carry the
information to the low level driver in an easy to extend way, and
the same setup can be extended to hold the state of the line
draw operation and bring updated coordinates back to the devdraw.c
function for use when calculating the size of the next rectangle,
visible or not.  The optimized versions of the low level driver
should simply skip along when only updating the state.  This needs
more math but finishes hidden segments faster (watch this effect
in any good windowing system by hiding parts of a "line draw active"
window and watch the still visible lines speed up).

Have a look in driver/fblin16.c and engine/devdraw.c (GdArea) for
more info on how we did our optimizations.

Personally I think extending and optimizing nano-X will be
difficult unless a clean separation of the low level drawing and
the upper level clipping smarts is done, and although this is not
a prime concern for us in the short run, I see it as crucial to
make an effort in this area, and I'm prepeared to help.

Regards,
Morten Rolland, Screen Media