Contributor Newsletter #5

In this issue
-------------
 - Forming titles
 - Character names
 - Locations
 - Writing credits
 - Cinematographer vs. director of photography
 - Processing cycle update
 - EPISODECORRECT-GUEST
 - Revised name filmographies
 - UNIX tool releases: 3.20, 3.21

Forming titles
--------------
One continuing source of confusion to people contributing new titles is
the way to format titles.  We have several precise rules (and some that
are admittedly a bit less precise).

The basic rules for forming a title: TV series and mini-series should be
enclosed in quotation marks; anything other than a TV series or movie
needs a description added to the end: (TV) for a made-for-TV movie;
(V) for direct-to-video; (VG) for a video game; (mini) for a miniseries.
Please note that quotation marks can only be used for TV series and
mini-series; there's no such thing as a "video series", for example.

Other factors such as whether the title is a documentary, short film, etc.
should not be included on the title (unless they appear in the title
on screen).  That brings us to the next rule: The primary title is the
title as it appears on screen, and if it differs between the beginning
and end of the film, then it's the title at the beginning of the film.
Oddities such as substituting numbers for letters (e.g., Se7en) and
intentional misspellings should be preserved.  In English, a subtitle
is set off from the main title with a colon (e.g., Lord of the Rings:
The Fellowship of the Ring, The); as that example shows, any articles
move to the very end of the title.  The exception to the article rule:
If the title is in a different language from the movie, as with Les
Girls, El Cid, or La Bamba.  Also, the French articles un/une/des and
the Portuguese articles um/uma do not move.  (In German, a subtitle
is separated with a hyphen.)

One exception to the "title as it appears on screen" rule: Author or
filmmaker possessives such as Bram Stoker's Dracula or Disney's The Kid or
Andy Warhol's Flesh are used only in alternate titles with the attribute
(complete title).  This doesn't apply to working titles, like Woody
Allen Fall Project 2000, since that's not a possessive in the same sense.

The primary title is the original title of a movie in its original
language.  If the movie is a coproduction that uses several languages,
pick the dominant language concerning dialogue, director, cast and
principal crew. If no language is dominating the others pick any one.
The title should be that used at the first public screening; a film can
have a different title at film festivals from when it goes into general
release, or be retitled on rereleases.  It's also common for titles to
be changed for television and video.  Again, these should be treated as
alternate titles with the appropriate attributes.  Other commonly used
titles, such as those on posters or reference books, should also be sent
as alternate titles.

Accents should appear as they are on screen, except that accents
omitted over upper-case letters should be restored.  Please note that
accents should be limited to those in the ISO-8859-1 character set; this
means that some accents from languages such as Turkish will have to be
omitted.  Languages using non-Roman alphabets should be transliterated;
for Japanese, use Hepburn romanization (Hy�jun-shiki).  If the title is
transliterated in the original release (films from India and Hong Kong
often include English subtitles in the original), use the on-screen
transliteration.

The year used should be the year of first public exhibition of
the final version, whether that was at a film festival, general
release, television showing, or whatever.  If that year is not known,
approximations from other sources can be used, such as copyright date.
If no good approximation is available, use ???? for the year.  If there
are two titles with the same title and year, they should be distinguished
with Roman numerals (e.g., Hamlet (2000/I)).  For this purpose, articles,
punctuation, and title type are ignored; thus, Magicians, The (2000/I)
(TV) and Magicians (2000/II).  There are cases where the /I might be
omitted, but it's best to leave those decisions to us.

A TV special should be classified as a TV movie, but with the keyword
tv-special.  That keyword, as well as the Documentary and Short genres,
causes special treatment of the title in some filmography listings.
For our purposes, a mini-series runs at least 240 minutes excluding
commercials; anything shorter is a TV movie, regardless of the number
of parts it is divided into (though in some cases it might be a series,
depending on the circumstances).  Direct-to-video and TV movie are
determined by the intent; for example, Theodore Rex (1995) went straight
to video, but it was intended for and budgeted as a theatrical release.

The rules for capitalization depend on the language of a title, and
not what appears on screen, since the on-screen capitalization is often
chosen for design reasons.  In English, "book" rules apply: All words are
capitalized except for articles and most prepositions and conjunctions
of four or fewer letters; the first word, and the first word after a
colon, period, exclamation point, or question mark are always capitalized
(and the second word, if the first is an article).  Exceptions are made,
rarely, when needed for clarity -- for example, BUtterfield 8, where the
first two letters represent a telephone exchange.  Portuguese, Hebrew,
and Indian languages use the same rules.  In most other languages, only
the first word (first two, if the first is an article), proper names,
and the first word after certain punctuation as above are capitalized.
German uses the usual German mixed rules.

Admittedly, this seems like a lot of rules, but most of them are common
sense; in reviewing over 350,000 titles, we've had to deal with a large
number of unusual cases.

Character Names
---------------
We try to list character names as they appear in on-screen credits (i.e.,
the end titles cast listing). We make occasional exceptions when character
names are not listed onscreen or when the character descriptions in the
end titles include spoilers, but as a rule we try to stick to credits
as closely as possible.

If you don't know what the onscreen character name is or one isn't
listed, here are some guidelines to help with character name contributions:

1. Keep it simple

Please omit redundant information/irrelevant details: Ralph Fiennes'
character in Red Dragon (2002) is called Francis Dolarhyde, and that's
how he's listed in the credits. It's simply overkill to have him listed
as "Francis Dolarhyde/The Tooth Fairy/The Red Dragon" even though those
are factually correct descriptions.

Names are usually enough and character names shouldn't be descriptive,
unless absolutely necessary to identify the actor (i.e if a role doesn't
have a name, someone may be identified as 'Man in van' or 'Woman with
umbrella').

Avoid extra embellishments/repetitions/nicknames unless they are part of
the credited character name: it's enough to list Robert Patrick as John
Doggett in the "X-Files" TV series, instead of "Special Agent Jonathan Jay
'John' Doggett"; Jeri Ryan played Seven of Nine on "Star Trek: Voyager",
not "Seven of Nine, Tertiary Adjunct of Unimatrix 01, aka Annika Hansen";
Ed Norton played Will Graham in Red Dragon (2002), not "William Graham"
or "Special Agent Graham" or "FBI Special Agent William 'Will' Graham";
Matt LeBlanc plays Joey Tribbiani in "Friends", not "Joseph 'Joey'
Francis Tribbiani". You get the idea.

Whether that extra info is accurate or not doesn't matter. Robert
Englund's character in the Nightmare on Elm Street films is known as
Freddy Kruger, not Frederick Kruger or Frederick 'Freddy' Kruger, even
though Freddy is probably the diminutive form for Frederick.

2. Character descriptions must be limited to the context of the film.

Anthony Hopkins plays Hannibal Lecter in The Silence of the Lambs (1991);
Brian Cox plays Hannibal Lecktor in Manhunter (1986). Yes, they're the
same character but the spelling is different and we will stick to each
film's peculiar version.

Sigourney Weaver's character in Alien (1979) is called simply Ripley. The
fact that her first name is Ellen is not disclosed/introduced until
the sequel Aliens (1986): therefore her character name in Alien (1979)
is Ripley, not Ellen Ripley.

Including extra information that comes from other sources than the film
is especially wrong: Nichelle Nichols plays "Uhura" in the TV series
"Star Trek" and in the films. Even though, according to some Star Trek
books and novelizations, her first name is Nyota, that name is not used
in the films or TV series to the best of our knowledge.

Even if the various Star Wars books and novelizations may include name,
rank and serial numbers for every single Imperial Stormtrooper ever
shown in the films, we'll still list them all simply as 'Stormtroopers'
unless the onscreen credits have a different description.

3. No Spoilers

Ian Hart plays Professor Quirrell in Harry Potter and the Sorcerer's
Stone (2001). You're not supposed to know he's also Voldemort. Ian Holm
plays Sir William Gull in From Hell (2001). His character name is not
"Jack the Ripper". Those are both supposed to be surprises.

If you haven't seen those two films, we just spoiled them for you. Sorry
about that, but imagine how our users feel when they come to the site
and see those character names before seeing the film.  Even if factually
correct, character names that constitute spoilers must be avoided at
all costs.

This is especially true for multiple character names that can be
easily omitted: it's perfectly adequate to say that Cary Grant plays
"Peter Joshua" in Charade (1963). There is no need to say that he plays
"Peter Joshua/Alexander Dyle/Adam Canfield/Brian Cruikshank", even if
that's true.

4. Language

David Prowse plays Darth Vader in Star Wars (1977). Clint Eastwood plays
Harry Callahan in Dirty Harry (1971). Even though the Italian releases
of those films changed the names to Darth Fener and Harry Callaghan,
we will stick to the character names used in the original version.

5. For TV series, use years when needed

Cast changes are the rule on long running TV series. Unless an actor has
been part of the cast for the entire run of a series, we try to include
the time frame of his/her appearances on the series.

For example, see the following character descriptions for "ER" (1994):

Noah Wyle      ...  Dr. John Carter
George Clooney ...  Dr. Doug Ross (1994-1999)
Paul McCrane   ...  Dr. Robert Romano (1997-)

Noah Wyle has been a cast member on "ER" (1994) since the first episode
and still appears in new episodes. His character name therefore doesn't
need a year attribute.  George Clooney was one of the original cast
members on "ER" (1994) but left the series in 1999.  Paul McCrane joined
"ER" in 1997 and is still a cast member to this day.  Note that these
are part of the character name, and not separate attributes.

Locations
---------
When sending location information, bear in mind that it becomes part of
a location tree (http://www.imdb.com/LocationTree), so it should make
sense within that structure.  In particular, the locations within a
country should be treated consistently; for example, within the United
States, the location must include a state, and (when possible) a city,
but not the county/parish unless no more detailed location is available.
Remember that each level of a location description is separated by a
comma; multiple locations within, say, the same state should appear as
separate location entries.

In general, smaller countries (both by area and by number of films)
should omit any political subdivision between the city and country.

Major cities and locations should be given their English names (thus,
Rome, Lazio, Italy, not Roma, Lazio, Italia); this also applies to
major international airports, etc.  Smaller towns and landmarks should
use the local names.  There is a long-range plan to allow proper use
of local names everywhere with automatic translation, but this is still
some time in the future.

Los Angeles deserves a few words of its own, both because of the number
of locations in the area and the complexity.  The city of Los Angeles has
several named neighborhoods that are actually part of the city; some of
the better known ones include Hollywood, Venice, Van Nuys, and Encino.
These are treated as divisions of the city (e.g., Hollywood, Los Angeles,
California, USA).  Named buildings are noted with their street address
at the same level -- for example, Bradbury Building - 304 S. Broadway,
Downtown, Los Angeles, California, USA.

The new LOCATIONCORRECT keyword can be very convenient when cleaning
up the locations in a given portion of the location tree.  The usage:

LOCATIONCORRECT
wrong-location|correct-location|
END

There is no web form support for this keyword; it must be sent directly
to the email interface (adds@imdb.com).  Changing a location will change
all subordinate locations; for example,

LOCATIONCORRECT
Rome, Italy|Rome, Lazio, Italy|

will also change Tivoli, Rome, Italy.

Several countries have already been cleaned up; before starting on a major
cleanup project for a country, it's best to check in on the Contributors
Help message board to see if there are others working on that country
and to form a consensus on the proper subdivisions to use.

Writing credits
---------------
In the past, writing credits with no attributes were assumed to be
"(screenplay)".  This is no longer true; all writing attributes,
including (screenplay), (teleplay), and (written by), should be included
with writing credits.  In addition, the "(also story)" form should no
longer be used; instead, send separate (story) and (screenplay) credits.
For example, where you might have sent:

Doe, John|Title (2002)|(also novel)

you should send

Doe, John|Title (2002)|(novel)
Doe, John|Title (2002)|(screenplay)

If you are comfortable with sequence numbers, include them, but even
without them, credits should be split as shown here.

It's probably worth noting here that "(written by)" has a specific
meaning, at least for titles covered by the Writers Guild of America
(WGA).  It means that the same writer(s) did essentially all the
writing -- story and screenplay/teleplay -- and there is no adapted
source material (novel, short story, article, etc.).

Cinematographer vs. director of photography
-------------------------------------------
In the past, the terms "cinematographer" and "director of photography"
were used interchangeably.  While we still believe they are virtually
identical, we are now permitting "(director of photography)" as an
attribute in the cinematographer list if that is how the on-screen
credit reads.  Cinematographers should still be sent with no attributes.

Processing cycle update
-----------------------
Since the last newsletter, we have continued reducing our cycle times.
This has been most visible on the guest appearance list, where data
is now processed every other day.  Many other lists are still being
processed on a weekly cycle, but with a cycle that isn't necessarily
tied to the Thursday-to-Wednesday cycle for names and titles.

We have determined that the processing of alternate names is best
handled on a monthly cycle.

New title approval has taken great strides recently.  We have made it
possible for several staff members to help with title approval; that,
combined with new tools, has greatly reduced our backlog.  In addition,
many of the people who contributed new titles still in backlog have
received mail messages informing them of what additional information
will speed approval of their titles.  Various groups of titles have been
identified for speedier approval; some of these groups, such as titles
from the USA or UK with valid release dates, no longer have backlogs.

EPISODECORRECT-GUEST
--------------------
One of the improvements we made to processing of guest appearances is a
new keyword, EPISODECORRECT-GUEST, that makes it much easier to clean
up the episode lists for a given title.  In conjunction with some of
our contributors, we have already cleaned up the data for a number of
popular series.  Where the data for a series was fairly complete (and
the series is no longer in production), we have removed data that lacked
episode information.  This should serve as incentive to re-contribute
it with complete information.

Revised name filmographies
--------------------------
We've recently improved the name filmography pages.  Most notably,
appearances as "Himself" or "Herself" have been moved into a separate
category.  The "self filmography" will eventually be a separate category;
in the meantime, it includes appearances in Documentaries and tv-specials
(as determined by genre and keyword entries, respectively), appearances
marked (archive footage) with no character name, and appearances in
any type of project as "Himself" or "Herself."  We recognize this is
imperfect (for example, some documentaries use re-enactment actors who
are not playing themselves), which is why it's an interim approach; we
feel the benefits are significant enough, particularly for well-known
people with large "self" filmographies, to make it worthwhile.

Titles that are still in production are also being flagged.  This
area should be expanding in the future, as we are now including
more in-production data from our partners at the Hollywood Reporter.
(Subscribers to IMDbPro will note expanded company contact information
for such titles.)

UNIX tool releases: 3.20, 3.21
------------------------------
The moviedb package (a local UNIX version of the database) has again
been updated to correct various capacity problems.  Version 3.20 was
released in late January; version 3.21 was released in late March, and
is essential if you are using the current data files.  It can be found
at the usual FTP sites; see http://www.imdb.com/interfaces for details.

Installation remains the same as for earlier releases.  Note that if you
are using the X Windows interface, xregal, it cannot be compiled with
current releases of X.  While the changes in this release do not require
recompilation of xregal, some of the capacity problems will continue
to occur if you do not.  If you have a working binary of xregal, you
can keep using it, but you will probably see an increasing number of
crashes, particularly for name filmographies with long episode lists.
Alas, the author of xregal has chosen to stop supporting it, so a newer
version is not available.

To rebuild:  Extract the tar file into a directory named database.
Assuming you already have a copy of the database files, from ./database/ :

make compile
make installbin
cd imoviedb; make; make install
# If you are able to build xregal:
# cd ../xregal; make; make install
cd ..
make cleandbs
make update-local
./etc/cgencompl -all # optional

If it's not working for you, check the following things first:

 . Do you have enough disk space?
 . Are the source files for moviedb up to date?
 . Are all the binaries in database/bin/ and database/etc/ up to date?
 . Did you do *all* relevant steps above in the order listed?

For further support, contact unix@imdb.com.

Questions
---------
Q: Can people in still photos be listed in cast credits?

A: If, and only if, the person in a still photo is listed in the credits,
   they can be added to our credits list.  If they are not, they cannot be.
   If an uncredited photo is notable, then it should be listed in trivia.

Q: Do running times include commercials?

A: Ideally, no; however, particularly for older programs, this may be
   the only data available.  In this case, please add the attribute
   (including commercials).
---------------------------------------------------------------------------
IMDb - Data Contributor's Newsletter - Issue 5 - THE END

No Comments

Contributor Newsletter #4

In this issue
-------------
 - Name corrections
 - New TGQ acceptance policy
 - Processing cycles
 - Composers of non-original music
 - Notes on various list policies
 - 2002: Year in review
 - Correcting series years
 - Release date changes
 - UNIX tools 3.19 released
 - Questions

Making Name Corrections and the Name Space
------------------------------------------
by Duncan Smith, name manager

Introduction

IMDb now records more than 1.25 million separate names. Most of them are
individuals, although some are groups (for example, rock bands, choirs,
dance troupes and orchestras). Some are animals and there's even a few
inanimate objects. The intention is to allow - as a name - any person,
group, animal or object that receives screen credit as a cast or crew
member, with the exception of commercial companies (which have their
own, separate name space).

Numerals

It is not unusual for people to share the same name, and that is why we
use roman numerals to distinguish them: (I), (II), (III), (IV) and so on.
Where someone is of sufficiently high profile, such as the famous American
director John Ford, we remove the numeral for display purposes in order not
to confuse people who are unfamiliar with the way we do things. Even then,
these "no numeral" names are in effect (I): the others are numbered (II)
and higher. We call the name that someone is usually known by - their stage
or working name - their PRIMARY NAME.

Naka

Many in the film industry end up being credited under more than one name.
This may be because they have deliberately changed their name at some point
- for example a woman who changes her name upon marrying, or someone who
moves to a country with a different language and doesn't want to use a
"foreign sounding" name - but a single name may also take several forms:
someone called John may also be known as Johnny or Jack. People may also
sometimes use initials or middle names and other parts to their names (such
as Jr., or 'junior'). Names are even misspelled on prints!

We call such alternative names naka, short for "name aka." Aka in turn
stands for "also known as." We say naka to distinguish name akas from
title akas, alternate titles for films. It is important to know that
the namespace for nakas is kept completely separate from that for the
names themselves. In particular, there are no roman numerals in the
naka namespace, even when the naka is the same as someone else's primary
name. This may seem arbitrary, but in fact helps with the management of
the name spaces both for primary names and nakas.

Note that naka data should not be used in an attempt at somehow linking
two names together or as a substitute for namecorrecting.  Such "two-way"
nakas are confusing and make the true primary name obscure.  If two
filmographies are the same person, tell me - either by using CORRECT-NAME
or COMMENT-NAME or using NAMECORRECT.  Much more on this below!

Accents and non-Roman Alphabets

I receive many requests to change a name by adding accents or changing the
case of part of a name. I often reject such requests, for many reasons.
I will deal first with names in the Roman or Western alphabet.

Just because a name is spelled with an accent in a language such as
French, Spanish or Italian does not mean that we should store it that
way. In the first place, film credits are often presented in upper case,
and in some languages capital letters are not accented. The majority
of someone's credits may well be without accent. Second, if someone
moves to an English-speaking country they may not use the accented
form of their name quite deliberately, or find that credit compilers -
unfamiliar with the language - drop the accents, cannot generate it on
their computer or typewriter, etc.

Some languages, such as German and Dutch, use parts of names that may
begin with a lower case letter. I have been told that the rules for these
two languages are that von, van, vander, etc. always begin with a lower-
case 'v'. Not all our names are represented this way yet but in time I
will fix things up. Please do NOT, in the meantime, rearrange or correct
such names just for the sake of consistency. I will get around to it!

The situation for de, di, de la, etc. is more complicated. I have heard
of rules for French names involving distinguishing between aristocratic
names associated with land ownership ('de') and those derived from the
Flemish ('De'), but in practice this may be difficult. As far as the
Italian 'di' and the Spanish/Portuguese 'de la' are concerned I have yet
to fix on particular rules. Italian and Spanish correspondents are
welcome to contact me directly, but note again that capitalization may
depend on country of residence, not the linguistic origin of the name.

Languages that are based on a different alphabet such as Russian,
Japanese, Chinese, Korean, etc. are paradoxically sometimes easier to
represent. For example, Korea has recently fixed on a standard Roman
form: the last name is capitalized, but only the first part of the
first name is - the second part of the first name is all lower case:

	Kim, Dae-hyeon

Note that Chinese names, although apparently similar, always use
capitalized parts. For Russian and Japanese names, I tend to rely on
the expertise of contributors from these countries. Please note that we
are not yet completely consistent in our representation of non-Roman
names.

Namecorrecting

If a name is incorrect, the keyword to use is NAMECORRECT. This can be
qualified with the list extension to make a correction that is specific
to the names on that list. This is useful when two or more people have
been combined into a single primary name by accident. This is a far
better approach than deleting all the credits and re-entering them -
deletions will be processed before the new name is created, so credits
can disappear for a week or more which is obviously undesirable. It
also involves a lot more work for us! So if for example the actor
John Doe is different from the costume designer John Doe, you could
separate them by sending the following data:

NAMECORRECT-ACTOR
Doe, John|Doe, John (I)|
NAMECORRECT-COSTUME
Doe, John|Doe, John (II)|

But what if you are combining two names into one, and you think the
two forms of the name should be retained? We have introduced a new
keyword, NAMECORRECTAS. This will namecorrect and add (as ...) at the
same time. Note that this functionality has been available to me when
carrying out management of the name space for quite some time now. So
for example,

NAMECORRECTAS
Doe, Jack|Doe, John|

will result in the credits for Jack Doe transferring to the filmography
for John Doe with (as Jack Doe) added to each. This keyword can also be
qualified by the list extension:

NAMECORRECTAS-ACTOR
Doe, Jack|Doe, John|

Comment and correct

Please only use COMMENT-NAME and COMMENT-NAMECORRECT to provide further
information on a NAMECORRECT. Please only use CORRECT-NAME and
CORRECT-NAMECORRECT to indicate that a namecorrect is necessary but
could not be performed using the NAMECORRECT/NAMECORRECTAS keywords -
for experienced contributors, that should be almost never!

Group cast names

We are currently inconsistent in this area.  It's a mess and we
know it.  Eventually, group cast names will be treated as entities in
their own right and anomalies such as (as The Beatles) will disappear.
The credit will be for the primary name "Beatles, The," which will have
its own filmography but with links to the individual members.  Meanwhile,
please don't provide group cast naka for individuals, as I reject them.
That is, do not submit 'The Beatles' as naka for, say, John Lennon:
he was never known personally as such - this was the name of the group
of which he was part, not an alternative name for the man himself.

In summary, please submit the group names instead of the individuals
where the credits list them as such and treat them like film titles, so
"The" and "A" should be placed last after a comma.

Summary and Golden Rules

That's a lot of information, I know. Please bear in mind the following
key points:

1. Don't NAMECORRECT to add accents to names unless you have good evidence
   that the accent is always (or usually) used.  Grammatically based
   name corrections will be rejected!

2. Use NAMECORRECT-list or NAMECORRECTAS-list to separate filmographies
   when you can. Only use adds and deletes when absolutely necessary.

3. Don't use two-way naka as implicit namecorrects, don't send in group
   cast naka for individuals and remember - they don't need numerals

New TGQ acceptance policy
-------------------------
by Tim Norris, co-manager, TGQ lists

During the past 12 months there have been a few changes to the way we
process Trivia, Goofs and Quotes (TGQ). The most obvious change was that
we acquired a new list manager and began to process them at all, working
our way through a large accumulated backlog (thanks to all regular TGQ
contributors for your patience). But there have been other changes as well.

With a new list manager, it's inevitable that there will be a new attitude
and new ideas, and we're slowly redefining our acceptance criteria for
items in the Goofs and Trivia lists. New guidelines are being written
and will be online later in the year, but for now we thought we might
forestall a certain amount of frustration and irritation among T&G
contributors by explaining one of the most important changes.

There's a strong case supporting the notion that IMDb's purpose is to
store all movie-related data and that *all* contributed goof and trivia
information, if true, should appear on the site. But there's a stronger
case for the idea that the T&G lists, designated as they are as "fun
stuff", need to be interesting and relevant as well as "true" (and not
included in other lists). Other lists can, and should, be thorough and
all-inclusive, but the T&G lists are more free-form and exist to inform,
astound and entertain. An unprocessed mind-dump of every single little
thing we receive would have no value - raw data of that sort is largely
useless - and editorial decisions are essential.

The problem, of course, is that it's difficult to empirically define
"interesting", but we do have a simple test which seems to work pretty
well when we've tried it around the team. Whenever we get an item that
falls somewhere in the shadowy no-man's-land between interesting and
ho-hum, we ask ourselves, "Would I be comfortable telling this to a
complete stranger at a party?" It's not infallible, and it has to be
weighted for fan-factor ("Would I be comfortable telling this to a
complete stranger at a convention?"), but as a starting point, it's
not bad. If you don't think your trivia nugget would impress one person
face-to-face, why share it with 13 million IMDb visitors?

As the backlogs clear, there will be more time available to evaluate the
information we already have, as well as to more critically examine the
new information that continues to come in, and we hope you'll see a
marked improvement in the quality of our T&G coverage as well as the
usual continued increase in quantity. We rely utterly on the excellent
stuff we get from you, and between us we can make the lists better than
ever. Over the course of the coming year it will become increasingly
difficult to get trivia and goofs listed, but it's not personal and it
shouldn't be a reason to give up - don't view this as a disincentive,
look at it as a challenge.

Processing cycles
-----------------
In the past, all data was collected into a weekly batch, and processed
on a weekly cycle.  Everything sent by 11 PM Pacific time Wednesday would
be collected into a batch and presented to our list managers; they would
then update their lists over the next week, as their schedule permitted.
New names and titles were added earlier on the day Wednesday so data
associated with them could be included in the weekly batch.

As we have been updating our software that we use internally, many lists
have moved to a daily basis to improve turnaround time.  Some of the
lists that have already moved to a daily cycle include trivia, goofs,
quotes, biographies, and the name and title URLs.  However, this doesn't
guarantee that all data will be processed each day; where a backlog exists
(TGQ), the new data is still processed in priority order.

We have also moved to a partial daily cycle for new titles.  The data
for new titles is still given to the list manager on a weekly basis,
but new titles are approved on a daily cycle.  For now, the reference
files are still produced on a weekly cycle, but we plan to move those to
a daily cycle as well.  For new titles that clearly meet the standards for
inclusion in the database, this should reduce the delay before they appear.
There is still a large backlog of titles with insufficient data to show
they qualify for inclusion.

Most other lists are still being processed on a weekly schedule, and are
generally processed within one week.  The exceptions: If a new name must
be approved, that adds a week to the cycle.  If a new company is sent,
the data may go into a backlog.  Alternate names (naka) are on a one-week
delay.  Besides the trivia, goof, and quotes lists, which have large
prioritized backlogs, the awards, awards-master, and DVD lists also have
large backlogs, and the laserdisc list is not currently being processed.

Composers of non-original music
-------------------------------
A recent change to our displays has highlighted some old but little-known
policies regarding composers of non-original music.

If a composer is credited for music used from another source (usually
classical music), and it's not appropriate to relegate them to the
soundtrack list, their composer credit should include the attribute
"(from ...)" indicating the work(s) of theirs that were used.  Examples:

van Beethoven, Ludwig|Clockwork Orange, A (1971)|(from "9th symphony")
Strauss, Richard|2001: A Space Odyssey (1968)|(from "Also sprach Zarathustra")

If a composer's music is reused, and again it's appropriate to put them
in the composer list, but there's no appropriate "from", then the
attribute "(r)" should be added.  This can be used when you know the works
of a classical composer were used, but don't know the title, or when music
from an earlier soundtrack was used.  Example:

Bakaleinikoff, Mischa|Bank Alarm (1937)|(r) (stock music) (uncredited)
Berlin, Irving|Star Trek: Nemesis (2002)|(r) (song "Blue Skies")

However, the use with songs should be avoided; these should generally be
included in the soundtrack list.  At some future date, the entries in the
soundtrack list will be more completely integrated into filmographies.

Policies on various lists
-------------------------
Color list: Entries for newer movies should generally not include any
attributes.  Deluxe, CFI, FotoKem, and Technicolor are laboratories
(for newer movies) and should go to the technical list.  Information
about film stock (35mm, video, etc) also belongs to the technical list.
Colorized films are not accepted in this list; they should be sent
to the alternate version list.

Certificates: Certificate entries are accepted for video games.
For USA titles, the current MPAA rating system was introduced in
1968.  Before that time, movies were approved and issued a PCA
number.  For newer movies, there is a certificate number that can be
sent as an attribute like "(certificate #34544)".

There is a difference between Unrated and Not Rated. Not Rated means
the movie has never been rated and Unrated means that there is a regular
rating for the movie, but this version was altered and therefore has no
rating, usually a director's cut on DVD with more sex or violence.

Please note that certificates should reflect only the actual rating from
official censorship/rating authorities, not expected ratings.

Countries, Languages: These lists do have sequence numbers that should
be used when possible, but there is no current support for them in the
web interface.  Languages should not reflect dubbed languages, unless
that's the only format that was released.  Our country list reflects
the country that existed at the time the film was produced; for example,
German films from 1949-1990 should specify East Germany or West Germany.

Sound Systems: In the early days of sound films, a number of different
recording systems (Westrex, RCA Sound System, Western Electric Sound
System) were used.  Since all these systems produce an essentially
identical Mono soundtrack, we now list them as Mono|(system) -- for
example,

Rebecca (1940)|Mono|(Western Electric Sound System)

The process of cleaning up some of the existing entries is ongoing.

For newer movies, "Stereo" is usually not sufficient; instead, the
correct system (DTS, Dolby, Dolby Digital, etc.) should be used.

Running Times: For TV series, the running time per episode is used,
not the total running time.  If a series is no longer in production,
the total number of episodes can be added as an attribute.

Biographies / comments & corrects: Please don't use the COMMENT and
CORRECT keywords to contribute data; use the BIOGRAPHY keyword.  Also,
corrections to existing data should go only under CORRECT, not both.

2002: The year in review
------------------------
Once again, we've broken all our records over this past year.  The volume
of contributions has grown from almost 120,000 items weekly to over
143,000.  That includes an average of over 750 new titles approved each
week, along with a great deal of data on existing titles.  We've begun
revising our tools to make it easier to process this increased volume,
along with the expected further increase once our new additions system
is launched.  As noted above, many lists are now being processed on a
daily cycle, and more are on the way.

We've also been able to add many more photos, with galleries from major
events like the Academy Awards appearing overnight or even sooner.

Not only has the amount of data included and added to the database
continued to grow, so has the traffic.  Our monthly unique visitor count
has grown from almost 10 million to over 13 million in the past year,
and our traffic has continued to set records, with several days over
12 million page views.  The technology running the site is well on the
way through a complete revamp to allow us to handle this volume and the
make changes easier.  The biggest recent change, of course, has been the
new message board system, which has been much more popular than the old
system (as measured by page views); this includes, for the first time,
message boards associated with each name page.  Now that much of our
underlying technology has been replaced, we expect to introduce many
more new features in the coming year.

Correcting series years
-----------------------
The year range for a TV series often needs to be corrected, particularly
with current series as they are renewed or cancelled.  Our preferred
way of doing this (for now) is via a TITLECORRECT, specifying the
new year range.  For example:

TITLECORRECT
"Oz" (1997)|"Oz" (1997-2003)|
"NYPD Blue" (1993)|"NYPD Blue" (1993-????)|

This will not affect the actual title of the series; data for the above
series would continue to appear under "Oz" (1997) and "NYPD Blue" (1993).
This is just the most efficient way to communicate the year change to the
title editor, who has to make these changes by hand in any case.

If you are sending the title of a previously unknown series that has
already ended, you should include the years in the TITLE data as always:

TITLE
"I Love Lucy" (1951)|1951-1957|

Release date changes
--------------------
We recognize that release dates of new films change frequently.  While
it's a good idea to provide a comment to any change that isn't quite
clear, the movement of a release date is usually clear and doesn't need
a comment.  Exceptions include if you are changing a release date from
a specific date to a more vague date, or if you are providing an earlier
date in a foreign country than in the film's native country.

UNIX tools 3.19 released
------------------------
The moviedb package (a local UNIX version of the database) has again
been updated to correct some capacity problems.  Version 3.19 was
released in late October, and is essential if you are using the
current data files.  It can be found at the usual FTP sites;
see http://www.imdb.com/interfaces for details.

Installation remains the same as for earlier releases.  Note that if
you are using the X Windows interface, xregal, it cannot be compiled
with most new releases of X.  However, the changes in this release do
not require recompilation of xregal, so if you have a working binary
of xregal, keep using it.  Alas, the author of xregal has chosen to
stop supporting it, so a newer version is not available.

To rebuild:  Extract the tar file into a directory named database.
Assuming you already have a copy of the database files, from ./database/ :

make compile
make installbin
cd imoviedb; make; make install
# If you need to build xregal and are able to:
# cd ../xregal; make; make install
cd ..
make cleandbs
make update-local
./etc/cgencompl -all # optional

If it's not working for you, check the following things first:

 . Do you have enough disk space?
 . Are the source files for moviedb up to date?
 . Are all the binaries in database/bin/ and database/etc/ up to date?
 . Did you do *all* relevant steps above in the order listed?

For further support, contact unix@imdb.com.

Questions
---------
Q: I found a title that doesn't meet the criteria you mentioned last issue.

A: Yes, the criteria have been tightened over time, so it's likely some
titles already in the database lack enough data to prove their eligibility.

You can point this out to the title manager with COMMENT-TITLE if you are
sure it does not meet the standards.  This includes announced projects
that were never made.

Q: Are release dates essential for old movies?

A: Not necessarily.  The article last issue was aimed at new releases, as
they give people the most problems.  For older movies, simply indicating a
major studio and well known cast or crew member will probably suffice.
However, release dates for older movies are generally pretty easy to find.
You should still include the year of release in the title, if possible.

Q: Do you accept music videos?

A: Music videos are currently accepted as titles only if they are released on
DVD / VHS (generally, these are compilations or sometimes longform videos).
The individual titles within such compilations are not accepted.

Music videos can be included in the "other works" section of biographical
information.

Q: How do I create a reference to a person or title in a trivia item?

A: Use the "qv-reference" format.  For a person, enclose the name in
apostrophes; for a title, enclose it in underscores.  Thus: 'John Doe' (qv)
appears in _Title (2002)_ (qv).  Please be sure to use character names
instead of actor names wherever relevant.  (The same format works in most
of the free-format text lists: goofs, soundtracks, biographies, etc.)
---------------------------------------------------------------------------
IMDb - Data Contributor's Newsletter - Issue 4 - THE END

No Comments

Contributor Newsletter #3

In this issue
-------------
 - New message board
 - New keyword: LOCATIONCORRECT
 - Getting a new title accepted
 - New additions system in development
 - Cast order revisited
 - Update on trivia, goofs, and quotes
 - Short questions

New message board
-----------------
Since the last newsletter, a new message board system has launched on the
site.  Along with a number of new features, we have also added a special
message board for people who contribute data, called "Contributors Help."
This board is actively monitored by IMDb staff, who attempt to respond
as necessary (though we're gratified that our community of visitors
handles some of the common queries for us).  You can visit this new
message board at http://www.imdb.com/board/bd0000042/threads/ or just
click on the "Message Boards" tab at the top of any page and scroll down.

New keyword: LOCATIONCORRECT
----------------------------
In response to a comment on the message board, a new keyword has been
created: LOCATIONCORRECT.  This keyword is used to correct entries in
the location list where the location is misspelled, or does not fit
properly into the hierarchy at http://www.imdb.com/LocationTree .  It
should not be used if the entry for a movie is a valid real-world location,
just not where the movie was filmed.

To use the keyword, send something like this to the mail server:

LOCATIONCORRECT
wrong location|right location|

Thus:

LOCATIONCORRECT
Los Angeles, CA, USA|Los Angeles, California, USA|

You must supply the complete location for it to be corrected; thus, in the
example above, correcting just "CA, USA" would not work.

Getting a new title accepted
----------------------------
One of the more common questions we get from frustrated contributors is:
Why has my new title not been accepted?  As many of you know, there's
currently a processing delay of a few weeks, so sometimes the answer is
just, "be patient," but often the title has been rejected.

The most common reason for a title to be rejected is because it does not
meet our criteria for inclusion.  They are spelled out in our new title
guide at http://www.imdb.com/Guides/new-titles, but the short version is
that a new title must be of "general public interest" and must be/have
been available to the public.

This means that an independent film still in production made by people
with no track record is, generally, ineligible.  But many titles that
are eligible for listing do not get included because the data fails to
demonstrate that they meet the criteria.

If a film has been shown to the public, it's critical to include a
release date.  This can be approximate (e.g., month and year only),
but it should include any appropriate modifiers, like the name of the
film festival, the city for a single-city release, or "(limited)" for a
limited multi-city release.  If the film has a distributor, that should
be included; if it debuted on television, then the name of the television
network should be used as the distributor.  If it's available on video
through established channels, then again, the name of the distributor
and release date are important.  A Google search is often helpful if
it's been shown in festivals (try the title plus the director name).
While including an official URL is helpful, be aware that it's not a
substitute for including the distributor name (even if the URL is a page
within a distributor's web site).

If a film is still in production, it's eligible for listing if there's
a high probability it will be available to the public.  In general,
that means an established production company and/or distributor, or
well-known filmmakers/actors.  In this case, we ask for a certain amount
of information so we can properly track the film.  Films in production
often change title and the people working on them can change, so we need
more than just a title and a single name to allow us to track them.
In any case, if there's only one name connected with the title, it's
not far enough along in the process to have a high probability of being
made (ask anyone who's seen the mountain of unproduced screenplays at
most studios).  With the exception of a few high-profile projects that
have an excellent chance of being made (e.g., Star Wars Episode III),
we don't track films until they are solidly into the preproduction phase.

The volume of data we process means we generally ask the contributor to do
the research to show that a title qualifies; we add over 1000 new titles
to the database in some weeks, and hundreds more are discarded.  Many of
those rejected titles eventually do find their way into the database,
either because someone eventually sends them with enough relevant data,
or because they get accepted by a festival and thus become eligible.

Please note that the only data that counts is formatted data contributed
through our web interface; vague comments like "I hear he's making a
movie with so-and-so" are worthless to us.  However, if there's some
unusual reason why a movie should be included despite not meeting the
normal criteria, please do include a comment explaining why along with
the other data (which should still include as much formatted data
as possible).  That comment should be included as a COMMENT-TITLE.
One example that comes to mind: George Lucas in Love, which was widely
available to Hollywood insiders and was written about in several major
publications before it was accepted by a festival.

New additions system in development
-----------------------------------
As has been mentioned on the message board, a replacement for the current
additions system is in development.  While it is still some weeks away,
some of the main features can be discussed now.

In the existing additions system, the web interface serves only to format
data for the mail interface; all data goes through the mail interface.
In the new system, the web interface will become the primary means of
contributing data.  This means that many of the more mysterious rejection
messages will no longer appear, as there will no longer be a disconnect
between what the web interface accepts and what the mail server accepts.

An interface will be available for bulk contributors, but it will feed
into the web interface.  Rather than the current mail interface, you
will upload a file and get immediate feedback.

As there are a number of things that can be correctly sent through the
mail interface today that cannot be created with the web interface,
this will mean a complete reworking of the web interface to allow all
valid data to be sent.  This will also mean errors are detected closer to
their source, which should allow us to give more helpful error messages.
The web interface will probably be deployed on a section-by-section basis;
the first section is currently projected to be release dates.

Existing data will be much easier to correct with the new system as well,
with easy-to-use forms instead of the current complicated process.

One of the best features of the new system will be the ability to see what
data you've sent that is still pending and its current status.  You will
also be able to modify or add to data you've sent that is still pending.
Also, in our long range plans, you'll be able to see and comment on data
from other contributors that is pending.

We expect that the web interface will feed more directly into our
back-end processing tools, which should improve the turnaround time for
processing data.  Among other things, it means data will no longer be
held in weekly batches, but will become available to list managers on
a daily, or even continuous, basis.  (This doesn't necessarily mean it
will be processed on a daily basis; some list management may still be
handled on a weekly cycle for a while.)

As the new additions system progresses, we may ask for volunteer beta
testers, most likely on the message boards.

Cast order revisited
--------------------
It's been pointed out that the article in the last issue on cast ordering
was a bit over simplified.  To reiterate that article: The cast list order
is determined by the most comprehensive cast list.  In modern films, that
list is usually in the closing credits.

However, some productions (notably TV movies) split the cast list, with
the major stars listed in the opening credits and only the supporting
players listed in the closing credits.  In such cases, the cast list
should be treated as a single cast list, interrupted by the movie itself.

Update on trivia, goofs, and quotes
-----------------------------------
We're pleased to announce that the backlog of trivia, goofs, and quotes
for the 1000 most popular titles in the database (as determined by page
view) has been cleared.  We expect to keep up with new contributions
for the top 1000 titles, and to make progress on the backlog, where we
will continue to focus on the most popular remaining titles.  We have
also begun processing comments and corrections for the top 1000 titles.

Questions
---------
Q: Why don't you list movie-links for TV series (e.g., references made by
the series)?

A: Particularly for a long-running series, the list would become unwieldy.
When we support episode titles, we will reconsider this.

Q: Now that episode lists have been split into separate entries, does the
earlier limit of 5 episodes per person still hold?

A: No.

Q: What's the dividing line for a Short versus a feature?

A: 40 minutes, though we allow a few minutes slack as some sources don't
consider credits in their timings.

Q: What's the difference between a miniseries and a TV movie?

A: Anything over 240 minutes running time (excluding commercials) is a
miniseries; anything shorter is a TV movie, even if it's broadcast in
multiple installments.

Q: Now that you accept animal credits, you should also indicate the type
of animal.

A: Good idea.  We'll think about adding a special place for this; in the
meantime, please submit an appropriate biography trivia entry.

Q: Where should the titles of individual episodes of theatrical serials
be placed?

A: For now, send them as trivia entries for the serial title.

Q: Are quotes accepted in any language other than English?

A: No.
---------------------------------------------------------------------------
IMDb - Data Contributor's Newsletter - Issue 3 - THE END

No Comments

Contributor Newsletter #2

This is the IMDb contributor's newsletter, published every 6-8 weeks.
To unsubscribe, send a message to data-news-unsubscribe@mlists.imdb.com.
To subscribe, send a message to data-news-subscribe@mlists.imdb.com.
You can also use the signup page at http://www.imdb.com/maillists .

Feedback on these articles or suggestions for new topics are welcome;
contact dnews@imdb.com.  The most interesting questions will be used
in the next issue.

Issue #2

In this issue
-------------
 - What happens when you submit data
 - Cast credit order
 - Historical figures in cast lists
 - Episode lists
 - Some comments about AKA titles
 - UNIX tools 3.18 released
 - Feedback

What happens when you submit data
---------------------------------
People sometimes wonder what happens to their data after they submit it.
It is not placed online immediately.  Once the mail server has accepted
your data, it is accumulated until about 8 AM GMT Thursday (11 PM PST
Wednesday).  The entire week's data is sent to the managers of the
various portions of the database.  Each list manager then extracts
the data for the parts of the database they are responsible for.
The data is sorted and duplicates are eliminated.  The list managers
spend some time making sure the data is formatted correctly and checking
for various inconsistencies, such as people working before their birth
or after their death (this may indicate two people with the same name,
but not necessarily).  Some data is checked against official sources.
As the various database managers complete work on a list, they upload
their information; the database is rebuilt nightly using whatever's been
added that day.  Some browsable sections of the database are rebuilt on
a weekly cycle.

That's the normal cycle.  However, when you add a new title, it has to
go through additional processing.  Because people often submit titles
that are not really new, or are not appropriate for inclusion, each
title must be examined and approved manually, based in part on the data
submitted along with the title, which is why it's important to submit as
much information as possible along with a new title.  This currently adds
two to four weeks to the cycle; data will not appear online until the
title it is associated with has been approved.  In addition, new names
must also be approved for similar reasons; this adds about a week delay.
If data is submitted to the wrong list (e.g., a casting assistant,
which belongs in the miscellaneous crew list, submitted to the casting
directors list), rerouting it adds another week or two.

While a title or name is awaiting approval, the data is kept to one side.
After the title/name is approved, the data is normally included the next
time the list is processed, which means it should appear within a week.
Unfortunately, it does sometimes get lost if there is an unusually
long delay or other problems; we are working to reduce the number of
these cases.

For certain kinds of data, additional work is needed.  Submissions of
URLs for new sites are verified to be sure the site meets our guidelines
of appropriateness (for example, sites submitted for a title must pertain
to that specific title, not a company or actor).  Finally, those lists
with free-form text need manual copy editing for wording and duplicates.
This takes varying amounts of times for the various lists, based on
submission volumes and quality, along with the backlog for those lists
(see the article last issue about the "TGQ" lists).  A reminder that
the TGQ backlog is processed in priority order; we've made excellent
progress in the last 2 months.

For reasons of timeliness, some information provided by IMDb staff
bypasses part of this process.  Most notably, editors collect box office
data and links to reviews at some web sites; these are updated in the
nightly build mentioned earlier.  We also update biographies when someone
notable dies, and information for certain high-profile awards will also
appear online much faster.  On the IMDbPro site, some of this information
doesn't even have to wait for a nightly build.

Over the next year, we hope to streamline the submission process,
eliminating weekly batching and making changes that should reduce the
number of bad title submissions.  There will also be opportunities to
see and comment on data that has been submitted but not processed.

This process has already begun; for example, URLs are processed daily,
not weekly.

Cast credit order
-----------------
The cast of a film is one of two sections of the database that does
not necessarily appear in alphabetical order (the other is the writing
credits).  The rule for determining this order can be confusing, since
we don't necessarily list the biggest stars first.

The rule is this: the correct order for credits is that of the most
comprehensive cast list, which in modern films is usually at the end.
If that leaves the stars way down in the list, so be it.

We do have another system for marking principal cast members that we have
not yet fully deployed; that will allow us to feature those actors on
the overview page regardless of cast order.  We also expect to some day
flag whether the cast is billed alphabetically or in order of appearance
(the two most common counter-billing orders).

Historical figures in cast lists
--------------------------------
We have many appearances for historical characters playing themselves
(e.g. Richard Nixon).  It's very hard to draw the line here because in
some cases those credits are valid and useful to have.  Keeping Nixon
as an example, some cases where the 'credit' is valid:

# "Cold War" (1998) (mini)
# Reel Radicals: The Sixties Revolution in Film (2002) (TV)
# Making of a Leader (1919-1968), The (1994) (TV)
# Houston, We've Got a Problem (1994)
# Secret Life of Richard Nixon, The (2000) (TV)

In other cases the credit is superfluous and should go.  For example:

# Frequency (2000)
# Contact (1997)
# Doors, The (1991)

Even though footage of him was used in those films (and we mark his
appearance as 'archive footage'), these appearances do not belong in
the main cast list.

The problem is that in many cases it's hard or impossible to make the
distinction unless you are familiar with the film.  For example, when
you see a credit like "Watergate" (1994) (mini), you don't really know
if this is a legitimate documentary appearance or some fictional based
on fact program that uses footage of Nixon the same way Contact (1997)
or JFK (1992) do.

In some cases it can be determined by checking the data we have on the
title (whether it's a docu, whether all other credits are for professional
actors or historical figures etc.) but that requires tools/time/effort
that we don't have right now.

We do reject many similar credits (many appearances by Hitler, Bill
Clinton, JFK or other historical figures are rejected every week).
The ones that are listed in the database managed to creep into the lists.
Not all submitters share the view that we should reject those credits;
some see them online, assume that this is the norm, and send more of them.
All this makes it harder to reject them, especially if we am not familiar
with the titles involved.

At this time we are erring on the side of accepting the credits when
in doubt, and possibly removing them later when they are determined to
be invalid.

At some future time, we may create another way of listing such appearances
that would clearly separate them from the main cast list.

Episode lists
-------------
As the coverage of television episodes has grown, some crew members have
accumulated episode lists that have become unmanageably long.  A more
comprehensive solution to episodes is in the works, but until it arrives,
we are using another approach.  Where in the past you may have submitted
multiple episodes in a single entry, like this:

Spotnitz, Frank|"X Files, The" (1993)|(episodes "Alone (2001)", "Daemonicus (2001)")

you should now submit each episode separately, like this:

Spotnitz, Frank|"X Files, The" (1993)|(episode "Alone (2001)")
Spotnitz, Frank|"X Files, The" (1993)|(episode "Daemonicus (2001)")

Existing entries are being converted.  In some cases, episode lists
were temporarily replaced with "(multiple episodes)"; these should be
converted back shortly as well.

Some comments about AKA titles
------------------------------

After the last newsletter, we got some feedback about aka titles
(alternative titles) in IMDb.  This is a response to those remarks.

We mentioned in the last issue that IMDbPro displays USA titles
where available.  This includes only those titles marked (USA)
with no additional attributes like (informal English title).  Thus,
a title that is only a translation used in a review and not an actual
release title should be marked appropriately; other possibilities are
(informal literal English title) and (video title).  Unfortunately,
we add about 1000 aka titles each week, and are unable to investigate
each one in depth.  It's thus more important than ever to be sure to use
the correct attributes on alternative titles.  If a title should have an
attribute and does not, please use CORRECT-AKA to point out the omission.
The attribute (theatrical title) should only be used for, and is only
present on, TV movies, mini-series, and video titles that would not
normally get a theatrical release.

There are many alternate titles with no attribute from the time before
we attached attributes to alternate titles; again, if you know what the
correct attribute should be, please report it with a CORRECT-AKA.

The year in an aka title should correspond to the year that title was
used.  Our tools will, by default, force the year in the aka to match
the year in the primary title.  However, if the aka title specifies
a country and we have a release date in that country with a different
year, the year in the aka title will be corrected to match.  Years in
less structured lists, such as distributor attributes, cannot be used
for this purpose; the only release years that really matter are those
in the release date list.

Finally, we recognize that aka titles for languages using non-Roman
alphabets are not always consistent.  While our title manager is fluent in
four languages, there are many languages where his knowledge is minimal
to zero.  Correcting transliterations requires detailed knowledge about
the original language, character set and transliteration rules.  We do
not have this knowledge for Japanese, Russian, Indian languages etc.
We depend on the knowledge of our users here.  The usual ways of
correcting data applies here as well.  There is no satisfying solution
to this problem as long as no experts are available that basically debug
the complete set of titles for one language and enforce standards to be
used on every single title.

UNIX tools 3.18 released
------------------------
Version 3.18 of the locally installed version of the database package
(moviedb) has been released.  It can be found at the usual FTP sites;
see http://us.imdb.com/interfaces for details.

There is one major change in this release.  The previous versions
could only handle 60,000 titles with votes; that limit has been removed
in this version.  In addition, various compile-time warnings should
no longer occur.

Installation remains the same as for earlier releases.  Note that if
you are using the X Windows interface, xregal, it cannot be compiled
with most new releases of X.  However, the changes in this release do
not require recompilation of xregal, so if you have a working binary
of xregal, keep using it.  Alas, the author of xregal has chosen to
stop supporting it, so a newer version is not available.

To rebuild:  Extract the tar file into a directory named database.
Assuming you already have a copy of the database files, from ./database/ :

make compile
make installbin
cd imoviedb; make; make install
# If you need to build xregal and are able to:
# cd ../xregal; make; make install
cd ..
make cleandbs
make update-local
./etc/cgencompl -all # optional

If it's not working for you, check the following things first:

 . Do you have enough disk space?
 . Are the source files for moviedb up to date?
 . Are all the binaries in database/bin/ and database/etc/ up to date?
 . Did you do *all* relevant steps above in the order listed?

For further support, contact unix@imdb.com.

Feedback
--------
Thanks to the people that commented on the first issue of the newsletter.
By far the most popular questions centered around our processing cycle,
which is why the lead article in this issue is an overview of that cycle.
Another popular question had to do with the proper method of correcting
names; a major article on that subject is planned for the next issue.
Many of the other articles in this issue were also inspired by user
questions.

Some other questions (summarized):

Q: Don't goofs take a lot of time to check?  Are people really that
interested?

A: They actually take more time to edit for readability and check for
duplicates than to research, but yes, our logs show that goofs are a
very popular part of the database.

Q: Do you accept submissions for animal performers?

A: We've recently changed our policy on this.  If an animal performer is
credited in the cast list, they can now be submitted with the regular cast
list.  If an animal performer is uncredited, or if their credit is buried
in the miscellaneous credits of a movie, then we do not accept them.
You should make your best guess of the actual gender of the animal when
determining whether to submit it to the actor or actress list.

Q: Why hasn't my miscellaneous crew submission appeared?

A: Backlog on this list was running about 4 weeks.  This has recently been
cleared and is now back to normal.

---------------------------------------------------------------------------
IMDb - Data Contributor's Newsletter - Issue 2 - THE END

No Comments

Contributor Newsletter #1

Why this newsletter
-------------------
As our submission volume continues to increase, we see a number of common
problems, many of which cause data to be rejected.  Unfortunately, due
to the sheer volume of submissions, we can't tell everyone when we have
to reject data.  In addition, from time to time, we change a policy,
and there was no way to let our submitters know.  Finally, while we do
have a feedback address (additions-help@imdb.com) for problems with the
submission process, there was no good place for feedback on general
data policy issues.  We decided that reviving our old newsletter, which
had fallen by the wayside in the wake of launching our daily newsletter,
was the best approach.

In future issues, we plan to include more tutorials, answer some of
the best questions submitted by our contributors, and possibly pose
some research challenges.  For this issue, we've got a lot of news
to cover.

In this issue
-------------
 - 2001 in review
 - Roman numerals in alternate names
 - Backlog on the "TGQ" lists
 - How to get your goof accepted
 - Title display on IMDbPro
 - Running times without countries
 - Plots and biographies
 - Soundtrack submissions

2001 in review
--------------
The year 2001 was again a record year for submissions to the database.
We received 6,228,316 lines of data, about a 35% increase from the
previous year.  Submissions this year are already running 10-20%
over last year's weekly average.  We also added 25,000 new movies
last year, or about 10% growth; we finished the year with over 297,000
titles and have already added 11,000 more, despite more stringent rules
for inclusion.  Overall filmography data grew by about 22% last year.
Our top 2 submitters (aside from IMDb staff) each contributed about
100,000 lines of data.

We've been working hard to improve our processing tools to help us keep
up with the increase in submissions; we've also added staff both last year
and this year.  Our profound thanks to all our contributors; you've helped
make us the most comprehensive source of movie information anywhere.

Roman numerals in alternate names
---------------------------------
As experienced contributors are aware, when two people have the same name,
we separate their listings by assigning an arbitrary Roman numeral in
parentheses to each person.  In some cases, we omit the Roman numeral
when one person is much more famous than the other (for example,
Harrison Ford).

A few months ago, we made a change in the way we handle alternate names.
In the past, all names had to be unique, which meant alternate names had
to include Roman numerals just like primary names.  Now, alternate names
never include Roman numerals.  This can cause some confusion for those
alternate names that duplicate primary names that do not include Roman
numerals (for example, Steve Allen), so you need to be extra careful in
those cases.

If you are using the local interfaces, you should be sure you are running
version 3.17 (released in November 2001), when partial support for the
new policy was added.

It's worth noting that names are now managed centrally; in the past,
each list manager handled names separately, which could cause problems
if a name that appeared on two or more lists needed to be split.  The
alternate names are outside of that central management system.

Backlog on the "TGQ" lists
--------------------------
The trivia, goof and quotes lists (or TGQ as we rather snappily call
them round here) were a little neglected towards the end of last year
and that has created something of a backlog of new additions.  But you
might have noticed that we're already working again on the trivia and
goof lists, and you'll no doubt be delighted to learn that work on the
quote list will begin very soon.

Nothing has been lost during this brief lull, and nothing will be
overlooked now that we're working again, but the nature of the lists
means that every item has to be read, checked, and edited by a real live
Human Being (you remember "people" - we were very popular in the '70s),
and it will take some time to clear the backlog completely.  It might
take quite a while before we get to your submission, but it's not lost
and we shan't ignore it.

Just charging headlong at the backlogs and clearing them in the order
they had been submitted didn't seem like the best use of resources.
We decided that we would work on the backlog by title rather than
submission date (i.e., clearing every submission for a title regardless
of its age) and that we could provide the best service to the greatest
number of our users by focusing on more popular titles (the titles that
the greatest number of people look at) first.  This doesn't mean that
the less frequently-hit titles will be forgotten about, just that it
will take a while longer for us to get to them.

If you've submitted anything for any of these lists, please be patient
with us and try as hard as you can to avoid the temptation to resubmit.
Our additions system is remarkably reliable (if a little baffling at
times) and things are seldom lost, so if you've sent it, we've got it.

How to get your goof accepted
-----------------------------
Tim Norris (the TGQ list manager) has this to say on the subject of
preparing your goof submissions:

I can't guarantee that your goof will appear on the site even if you
follow these 10 helpful hints (the list manager's decision is final,
no correspondence will be entered into, please keep your feet off the
jump seat, your home is at risk if you do not keep up repayments on a
loan secured on it, etc) but you can lessen the chances of your efforts
being thrown out if you:

1. Think again.  Was it really a goof?  Did his jacket really disappear,
or did he take it off while you were looking down the back of the sofa
for the remote?  Double check if you can.  Ask yourself if it might be a
joke (a lot of supposed goofs are actually jokes).  Then try to explain
it away somehow.  Only submit something as a goof when you're absolutely
sure it's a goof.

2. Double check "factual errors".  Many of the "facts" we get are not, in
fact, facts at all.  It's not burdensome, and it can often be quicker to
check something and find out that you were wrong than to go through our
impenetrable additions interface and submit it as a goof, so you might
actually be saving yourself time and effort.  Only submit something as
a goof when you're absolutely sure it's not right.

3. Don't tell us about differences between the movie and the original
book, comic book, radio series, TV show, computer game, magazine, beer
mat or bubblegum wrapper.  These aren't goofs.

4. Read the existing goofs carefully.  Have we already got it listed?
Only submit something as a goof when you're absolutely sure we haven't
already got it.

5. Use characters' names, not actors' names.

6. Check your spelling (especially characters' names, which you should
always use instead of actors' names, by the way).  Don't worry too much
about style and grammar (that's what editors get paid for) but the less
work I have to do, the more it's going to look like your submission when
you see it online.

7. DON'T SHOUT. And! Don't! Litter! Your! Text! With! Exclamation! Marks!
(I really don't like them)

8. Don't go overboard when describing the goof, but do try to give some
helpful detail to identify the scene.  Not all the detail will be used
in the finished version, but the more I've got, the more easily I can
understand what you're saying and check your submission.  If you think you
need to add a time from the DVD version (you really don't have to bother,
but some people like to), please let me know which Region version it is -
they run at different speeds.

9. Be polite.  If I've made a mistake, just tell me and I'll put it right.
It's not the end of the world and it can be fixed.  It doesn't get done
any quicker or any better if you're abusive or snotty, but I do invest a
few extra moments in sticking a couple of extra pins into our Rude User
voodoo doll.  Would your mother approve of your talking to strangers
like that?  Well then.

10. Relax.  This is one of the "fun stuff" lists.

Title display on IMDbPro
------------------------
For those of you using IMDbPro, you may have noticed that some titles
display differently.  Since our Pro customer base is primarily located
in the USA, we display USA titles whenever we have them.  Therefore,
it's important that aka titles be marked accurately with the country
whenever possible.  At some point in the future, we will allow people
to choose their desired country.

Running times without countries
-------------------------------
In the past, running times always had a country attached, and there
could be several conflicting entries. We've now introduced the concept
of the default running time which corresponds to the run time of the
original release in the country of production. This time is displayed
without a country, and importantly, only times differing significantly
from the default (owing to censorship or extended versions etc) are now
accepted. We discovered that small variations in submitted run times are
usually attributed to timing errors or people relying upon third party
sources (eg: newspapers) which were rounding times to the nearest 5
minute interval. If a different version of a title has been released in
your country with a different run time, wherever possible, please also
submit an entry to the alternate versions section explaining the changes.

A reminder to contributors in Europe and other regions of the world with
a 25 frames/second video system that TV and video recordings will run
approx 4% faster than their theatrical release. Please do not submit
video/TV run times for titles based on manual timings from home viewings.

We will eventually modify the submission process to accept running
times without countries specified; in the meantime, use the country of
production or first release whenever possible.

Plots and biographies
---------------------
A reminder that all plot summaries and mini-biographies must be your
own original work.  We have seen a large number of biographies copied
from official web sites or obituaries.  If you have permission from an
official web site, you need to include a comment to that effect with your
submission (or better yet, have someone connected with that site write to
us).  The same holds for plot summaries: if it's not your original work,
it needs to be credited properly and we need to know you have permission.

In biographies, please check the "other works" section if appropriate
before submitting trivia; items should not appear in both sections,
and the "other works" section is preferred when both are possible.
Titles of plays and other works should not be submitted in all caps.

Soundtrack submissions
----------------------
The soundtrack section includes only information on soundtracks of
the movie itself, not soundtrack albums.  This is for several reasons.
Many times a movie soundtrack is not released separately.  The music
released on an LP/CD many times does not match the music in the movie.
Sometimes, different soundtrack albums are released with different music
(often including songs only "inspired by" the movie).

The complete soundtrack guidelines can be found at

http://www.imdb.com/Guides/soundtracks

No Comments

Twitter links powered by Tweet This v1.8.2, a WordPress plugin for Twitter.