Animation Production in 3D – Part 1

Despite everything I said in my previous post, I am now going to write on a topic that would be of interest to people in Japan, and even more ironic, I’m going to type in on the computer because it’s one of those topics I can ramble on without being too picky about my words. See how that works? Alrighty, let’s get started.

The topic of today: The current and future technology that will be powering the production of animation as well as it’s effects on the overall quality of the said product. Since I am a technical kind of guy and this is my line of work, I’ll be giving you some fascinating insider details. (Hint, hint, that means a long blog post.)

The Magic of 2D

2D worlds look really cool. They ignore physics. When you see Coyote hide behind a tree thinner than he is or when he disappears into the distance by simply shrinking on the screen, it’s obvious the physics are completely different. But the most beautiful of animation is the stuff that reminds you of the real world but doesn’t capture reality. Instead, you find yourself looking into a window into someone else’s world rather than occupying yours. There’s a huge difference between the surrealism of Miyazaki’s My Neighbor Totoro and the buffoon but obvious 3D appearance of The Incredibles. The former has a magic in its world that the latter significantly lacks.

There are a number of reasons for this, but the most notable are these: shading, perspective, entity representation, and flow of motion. Let’s go over them one at a time.

Entity representation

In the real world, there are no outlines. We recognize independent objects by their different colors and the fact that we’ve seen them before. Thus, it’s possible for things to simply blend into the background. In cartoons, everything has a border. That’s what defines it as being a cartoon or comic. Thus, things stand out. But most importantly, it immediately establishes a new appearance for the world. Therefore, it’s essential these outlines be beautiful looking, otherwise the world is ugly.

Perspective

When you look at a 2D world, the perspective you usually associate with it is one of “flatness” whereby distance is determined by occupancy of a “layer” deemed to be a particular distance based on the size of objects occupying it. Characters and entities travel between layers by scaling up or down.

If the perspective is too strong (represented by the scaling of objects being too strong), the world will appear as obviously fake. (Just imagine Yosemite Sam chasing Bugs Bunny into the sunset.) However, if the perspective is too realistic, the audience will recognize it as being truly 3D and associate it with the real world rather than a different one.

The ideal balance is to be subtle about the stretch of the perspective such that the enjoyable aspects of both are present. The developers at ClipStudio realized this and designed their software to enlarge nearby 3D objects for a “manga style” or “anime style” when the user wants it. I’ve seen the technique used in anime, but the technique was used ineffectively, and I’d say the director has a major role in this case.

As an important side note, I’d say humans have their own concept of what “another world” looks like. Whenever I try to draw 3D objects, my perspective is tailored more towards visual effect than 3D accuracy, so when I line up my perspective grids (in whatever art software I’m using), I’m usually horribly off. But that doesn’t matter if you prefer it that way. Capture feel.

Shading

As you know, most cartoons use simple base colors. Textures are also relatively simple in cartoons. That doesn’t mean they aren’t pretty, but they are hard and the details in them are usually ignored by most people. What’s more troublesome is drawing what most people notice: shadows.

To make shadows easier to draw, artists resorted to drawing only basic, simple-outline shadows. (Real shadows have a blur along their edges due to light transport effects, but that’s not pretty unless you go all out and color everything else perfectly.) As digital technology progressed and enabled artists to toy more with color, basic shadows took on a life of their own and became a place to play with color.

Other shadows and lighting have been toyed with by artists in order to get something visually appealing. The consequence of pursuing such aestheticism was that it resulted in physically inaccurate shading and lighting. But at least the results were gorgeous.

Flow of Motion

In another universe, things move weird. In cartoons, characters are drawn in a way that’s easiest and fastest and often very, very exaggerated. Ironically, this adds a great deal of charm to animation. When you see Coyote flattened like a piece of paper, you know it’s impossible, but it’s amusing nonetheless. Characters in cartoons can swing their arms with lightning speed, run up a short hill for hours on end, or contort their body in a snakelike manner without so much as a house of mirrors in the film.

Anime tends to be more subdued (unless you’re watching Nichijou), but the animators themselves can’t help but take shortcuts (usually by cutting out frames) that result in non-physical, erratic movement. The erratic part is key. It signals to the audience that this new world on the TV screen doesn’t at all obey the physical laws of motion as we know them, and that’s quite charming.

Magic Formula

The magic formula is thus Entity Representation + Perspective + Shading + Erratic Motion. Of course, the actual implementation of these things is easier done by drawing with pencil on paper and then coloring on a computer, but there are a problems with that: time and money.

The Price of 2D

Let’s go over some numbers.

For hand-drawn animation, it takes about 30 minutes to draw a frame and 10 minutes to color it, according to videos by Kyoto Animation and my own personal experience (less believable, but hey, I put it out there). Animators are thus expected to put out about 300 frames a month, though it used to be 500. (The drawings become more difficult too, though.) On average, animation videos run at 12 frames per second (fps) – 8 fps for slow shots, 1 fps for panorama shots, and 24 fps for “the money shot” (action scenes). For our calculations, let’s assume a modest animation episode length of 20 minutes.

300 frames per month / 12 frames per sec = 25 seconds of footage per month of production
(20 minutes of footage / 25 seconds per month) x (60 sec / 1 min) = 48 months = 4 years

In other words, it will take you 4 years of manpower to create a single animation 20 minutes in length. That’s assuming you didn’t eat, sleep, go outside, get sick, or do anything else the next 4 years. Not happening. Obviously, bringing on more staff cuts the workload immensely, but it’s still a very long time. As Miyazaki discovered, he had fun in the garage, but he had to take on more employees to get more work done, which meant taking on more jobs to pay those employees, which meant hiring more people to help with the new jobs.

The cost is rather expensive, too. Despite the fact that animators in Japan are only paid the equivalent of around $2.75 per drawing (based on what I calculated per half-hour for salary), 2D animation doesn’t create very many reusable assets. It’s a one-and-done deal. While creating technology might put some people out of a job, the fact is, most people can’t sustain a living at that wage anyways, so animation companies have trouble finding people.

The popularity of animation has made it possible to maintain the production quotas necessary to keep the existing animation studios alive and even result in new places. However, when harder economic times hit, these places aren’t going to be sustainable. The solution? – More 3D.

Applying 3D Solutions

3D Animation is supposed to be the wave of the future. Many, many studios are using 3D because they get beautiful results and it’s cheaper in the long run. The most expensive part is the up-front cost to make the assets (3D models like cars and buildings and other entities). Once the assets are made, each minute of footage becomes cheaper and cheaper to produce. The cost savings have been noticed by animation studios in 2D, so they have been progressively incorporating it into their work.

But there’s a problem: It’s ugly.

Cars, machines, and other “non-organic” entities most people don’t care about, but even there it’s still noticeable. (“Organics”, in my artistic dictionary, mean entities with flexible or curvy bodies and structures, usually composed of cells (like plants) and/or having skin (like humans, animals, and aliens).) The 3D technology often violates the three principle features I discussed above. How so? First we need some background on 3D techniques.

Overview of Techniques of 3D Engines

3D engines were built for 3D, not 2D, but since the 2D world is often meant to mimic the 3D world, it would seem like we could reproduce 2D using 3D tools. Since 2D cartoons use their own physics instead of the true laws of physics, the rendering of cartoons is referred to as Non-Photorealistic Rendering, or “NPR”.

Aside from simple base colors, NPR is surprisingly difficult to do for a number of reasons.

Borders – Entity Representation

The first step in NPR is mimicking the entity representation of borders around objects. This is done by creating outlines based on the z-depth of pixels in the rendering buffer. What does that mean?

When a 3D engine (part of a computer program) draws a 3D image on a screen, it has to determine the depth (distance from the user’s view) of the surface it’s drawing at every pixel on the screen so that it can use that information for determining what gets drawn in front or behind that surface. This depth is saved in a secondary image called the “z-buffer”, and the values in it are “z-depth” values.

An alternative solution is using the vertex or surface normals. Without being too technical, we can say that a “normal” is the direction something faces. For example, surfaces that are pointed perpendicular to the user’s view are usually at the edge of an object and should have a border.

Basic implementation results: Ugly. In nearly every implementation of line art generated by 3D tools, the results for organics is hideous (notably choppy due to the variety of surfaces), and the results for non-organics (mechanical things, static things, etc.) is bland. The line art in cartoons is designed for aesthetic appeal, not true bends or edges. Some lines and curves are thicker, and many lines and curves tend to “trail off” from thick parts into needle points.

My own experience goes back to using Inkscape, where every line is the same size. When I switched to using ClipStudio, I found that I could replicate my hand-drawn line styles much better and the results were fantastic. Currently, this isn’t done in 3D because, well, there’s no math formula for aestheticism. It has to be faked somehow.

Check out the latest animation from the best companies in the world. If they are using 3D and trying for a cartoon style, I can just about guarantee you that you that the outlines of their entities are solid, one-width lines.

Some 3D engines such as Blender use Freestyle and have the power to render different styles of line art, but the results are… unsatisfactory to say the least. On the bright side, the line art can be colored to be based on the colors of the surfaces they are enclosing.

The coloring of line art is important for an awesome appearance, but it’s not essential early in the process. Moreover, solid black looks better than solid of any other color. Coloring line art correctly requires being able to give it gradient colors, which is something rendering engines simply aren’t able to do just yet.

Rasterizing vs Raytracing and the Problem of Shading

To render 3D objects, computers use either rasterizing or raytracing. The former (rasterizing) is like painting. You draw everything that you think is in the scene, one object at a time, but you have issues (of which comes first) if two objects overlap in distance from the user’s viewpoint. In raytracing, you walk through a room in the dark in nearly parallel lines, and when you bump into something, you color a point on the canvas (that represents that path through the room) based on what you just bumped into. That’s the layman’s description. Raytracing is – even in this analogy – very slow. You don’t want to ram through something, so you move slow. The benefit is that you are able to correctly draw the scene, pixel by pixel.

Rasterizing is much, much faster than raytracing. Therefore, you might think it would be more suited to animation. (After all, we need tons of frames very quickly.) However, the problem is that rasterizing doesn’t correctly handle shadows. You have to fake them, but the fake shadows from rasterizing aren’t the same as the fake shadows from an artist. Instead, they are rather boxy looking. Moreover, they start to become slow to make and use depending on the complexity of the entities being rendered.

Raytracing correctly creates the shadows we need – often called “hard shadows” due to the fact that the edges of the shadow are hard. As you would expect, this is what professional software uses. Recent techniques in how to perform raytracing have allowed speeding up the process to achieve near real-time results, so the time delay is no longer much of an issue (unless you’re using a dinosaur like mwah).

However, raytracing is at a disadvantage when it comes to fast coloring of objects. By rasterizing, we have the ability to create “masks” – areas of an image where color effects are limited. These masks can allow the easy recreation of shadows, unusual or extra lighting effects, and alternative coloring in a 2D setting. Raytracing can also accomplish these effects, but it’s solutions are either different in appearance or add to processing time and program complexity to achieve the same thing.

One last alternative is hybrid raytracing, which combines the shading benefits of raytracing with the speed of rasterizing. However, it still has speed problems from raytracing and z-depth (z-order) problems from rasterizing, so it’s doesn’t solve the problems anyways.

Perspective

Creating the charm of the subtle perspective stretch isn’t hard in theory, but in practice, it doesn’t always look the way you want.

Early anime that used 3D didn’t use perspective stretch or didn’t use it effectively, resulting in scenes where the third dimension of space was very obvious. Some techniques to hide the third dimension include using generic backgrounds (premade as 2D or simply devoid of entities that would make the audience aware of scale) and using a simplified color scheme for rendering objects. Both of them cheapen the overall appearance of the final footage, making them not worth it in the long run.

What would work better is adjusting the perspective. How? Under the hood, 3D engines perform calculations for where to place the “camera” (determining the user’s view, the “viewport”) as well as how to render the scene relative to the camera. For raytracing, this is simply a matter of determining a new starting direction of rays from the camera or light sources. Each ray from the camera determines the color of a pixel by moving through the scene until it hits something. For rasterizing, it means adjusting the positions of mesh vertices in the scene. (Meshes are representations of entities using vertices (points) and faces (surfaces drawn between these points).)

Technical note: There are two kinds of raytracers – one that determines pixel color by tracing the path of light from the light sources in a scene and one that determines pixel color by tracing the path of light from the camera to the light sources. The latter is faster.

The Flow of Motion

Up until people started using computers to create cartoons, one of the magical aspects about cartoons that animators may not have realized was a problem was the flow of motion.

Perfection is not what people want to see. Computer animation is too perfect. When a character turns or moves, it shouldn’t look consistent. It should be erratic, but not only erratic, it should be a special kind of erratic. Characters need to slink, not slide, jump, not leap, slam, not swing an invisible hammer.

While it’s true that the animation of characters and creatures is left up to the animators themselves, 3D engines have a role to play.

First, any time any motion happens in a 3D engine, the reference points on an entity (the points that stand out, like the nose or eyes) make you aware of the 3D nature of the image, thereby destroying the charm of looking through that window into a new world. In hand-drawn animation, the motion is consistent, but wrong. The proportions of things aren’t correct.

Consider, for example, the rotating of a head. In hand-drawn anime or American cartoons, you can’t tell exactly what angle the head is rotated from the camera. You can tell that it’s rotated or facing, but the positions of the nose or eyes on the head don’t give you perfect clues as to how much of a rotation you’re seeing. Only when the character rotates their head do you finally see where the rotation transitions from looking forward to looking perpendicular to the audience. In essence, hand-drawn animation allows us to capture everyone “on their good side” all the time. In 3D engines, this charm is lost because everything is drawn exactly where it should be. You get to see the “ugly angle”, but worse – you can tell the angle of rotation, and that brings you back to an awareness of the 3D world.

More Remarks

In case you haven’t noticed, I continue to emphasize the importance of preventing people from being aware of the third dimension. You’re supposed to look into the world, not feel like it’s in you’re living room. The animation should feel like an escape, a window into a new world, not an extension of your own.

It’s in this regard that many animations have failed. For example, RWBY by Rooster Teeth fails miserably in this regard. Aside from the outright neglect of shading and perspective (at least in earlier footage), the motion of characters really detracts from the charm show might otherwise have.

Closing Remarks

3D technology is, in fact, moving along and looking good in certain areas, resulting in technology we’re calling “Photo Surreal Rendering” (PSR). In part 2, I’ll talk more about that.

That’s enough reading for now. More technical details in part 2!

Enter the space and time of my little world… Welcome Earthling.

Blog at WordPress.com.