Archive for January, 2003
Contributor Newsletter #4
Posted by admin in Newsletter on January 14th, 2003
In this issue
-------------
- Name corrections
- New TGQ acceptance policy
- Processing cycles
- Composers of non-original music
- Notes on various list policies
- 2002: Year in review
- Correcting series years
- Release date changes
- UNIX tools 3.19 released
- Questions
Making Name Corrections and the Name Space
------------------------------------------
by Duncan Smith, name manager
Introduction
IMDb now records more than 1.25 million separate names. Most of them are
individuals, although some are groups (for example, rock bands, choirs,
dance troupes and orchestras). Some are animals and there's even a few
inanimate objects. The intention is to allow - as a name - any person,
group, animal or object that receives screen credit as a cast or crew
member, with the exception of commercial companies (which have their
own, separate name space).
Numerals
It is not unusual for people to share the same name, and that is why we
use roman numerals to distinguish them: (I), (II), (III), (IV) and so on.
Where someone is of sufficiently high profile, such as the famous American
director John Ford, we remove the numeral for display purposes in order not
to confuse people who are unfamiliar with the way we do things. Even then,
these "no numeral" names are in effect (I): the others are numbered (II)
and higher. We call the name that someone is usually known by - their stage
or working name - their PRIMARY NAME.
Naka
Many in the film industry end up being credited under more than one name.
This may be because they have deliberately changed their name at some point
- for example a woman who changes her name upon marrying, or someone who
moves to a country with a different language and doesn't want to use a
"foreign sounding" name - but a single name may also take several forms:
someone called John may also be known as Johnny or Jack. People may also
sometimes use initials or middle names and other parts to their names (such
as Jr., or 'junior'). Names are even misspelled on prints!
We call such alternative names naka, short for "name aka." Aka in turn
stands for "also known as." We say naka to distinguish name akas from
title akas, alternate titles for films. It is important to know that
the namespace for nakas is kept completely separate from that for the
names themselves. In particular, there are no roman numerals in the
naka namespace, even when the naka is the same as someone else's primary
name. This may seem arbitrary, but in fact helps with the management of
the name spaces both for primary names and nakas.
Note that naka data should not be used in an attempt at somehow linking
two names together or as a substitute for namecorrecting. Such "two-way"
nakas are confusing and make the true primary name obscure. If two
filmographies are the same person, tell me - either by using CORRECT-NAME
or COMMENT-NAME or using NAMECORRECT. Much more on this below!
Accents and non-Roman Alphabets
I receive many requests to change a name by adding accents or changing the
case of part of a name. I often reject such requests, for many reasons.
I will deal first with names in the Roman or Western alphabet.
Just because a name is spelled with an accent in a language such as
French, Spanish or Italian does not mean that we should store it that
way. In the first place, film credits are often presented in upper case,
and in some languages capital letters are not accented. The majority
of someone's credits may well be without accent. Second, if someone
moves to an English-speaking country they may not use the accented
form of their name quite deliberately, or find that credit compilers -
unfamiliar with the language - drop the accents, cannot generate it on
their computer or typewriter, etc.
Some languages, such as German and Dutch, use parts of names that may
begin with a lower case letter. I have been told that the rules for these
two languages are that von, van, vander, etc. always begin with a lower-
case 'v'. Not all our names are represented this way yet but in time I
will fix things up. Please do NOT, in the meantime, rearrange or correct
such names just for the sake of consistency. I will get around to it!
The situation for de, di, de la, etc. is more complicated. I have heard
of rules for French names involving distinguishing between aristocratic
names associated with land ownership ('de') and those derived from the
Flemish ('De'), but in practice this may be difficult. As far as the
Italian 'di' and the Spanish/Portuguese 'de la' are concerned I have yet
to fix on particular rules. Italian and Spanish correspondents are
welcome to contact me directly, but note again that capitalization may
depend on country of residence, not the linguistic origin of the name.
Languages that are based on a different alphabet such as Russian,
Japanese, Chinese, Korean, etc. are paradoxically sometimes easier to
represent. For example, Korea has recently fixed on a standard Roman
form: the last name is capitalized, but only the first part of the
first name is - the second part of the first name is all lower case:
Kim, Dae-hyeon
Note that Chinese names, although apparently similar, always use
capitalized parts. For Russian and Japanese names, I tend to rely on
the expertise of contributors from these countries. Please note that we
are not yet completely consistent in our representation of non-Roman
names.
Namecorrecting
If a name is incorrect, the keyword to use is NAMECORRECT. This can be
qualified with the list extension to make a correction that is specific
to the names on that list. This is useful when two or more people have
been combined into a single primary name by accident. This is a far
better approach than deleting all the credits and re-entering them -
deletions will be processed before the new name is created, so credits
can disappear for a week or more which is obviously undesirable. It
also involves a lot more work for us! So if for example the actor
John Doe is different from the costume designer John Doe, you could
separate them by sending the following data:
NAMECORRECT-ACTOR
Doe, John|Doe, John (I)|
NAMECORRECT-COSTUME
Doe, John|Doe, John (II)|
But what if you are combining two names into one, and you think the
two forms of the name should be retained? We have introduced a new
keyword, NAMECORRECTAS. This will namecorrect and add (as ...) at the
same time. Note that this functionality has been available to me when
carrying out management of the name space for quite some time now. So
for example,
NAMECORRECTAS
Doe, Jack|Doe, John|
will result in the credits for Jack Doe transferring to the filmography
for John Doe with (as Jack Doe) added to each. This keyword can also be
qualified by the list extension:
NAMECORRECTAS-ACTOR
Doe, Jack|Doe, John|
Comment and correct
Please only use COMMENT-NAME and COMMENT-NAMECORRECT to provide further
information on a NAMECORRECT. Please only use CORRECT-NAME and
CORRECT-NAMECORRECT to indicate that a namecorrect is necessary but
could not be performed using the NAMECORRECT/NAMECORRECTAS keywords -
for experienced contributors, that should be almost never!
Group cast names
We are currently inconsistent in this area. It's a mess and we
know it. Eventually, group cast names will be treated as entities in
their own right and anomalies such as (as The Beatles) will disappear.
The credit will be for the primary name "Beatles, The," which will have
its own filmography but with links to the individual members. Meanwhile,
please don't provide group cast naka for individuals, as I reject them.
That is, do not submit 'The Beatles' as naka for, say, John Lennon:
he was never known personally as such - this was the name of the group
of which he was part, not an alternative name for the man himself.
In summary, please submit the group names instead of the individuals
where the credits list them as such and treat them like film titles, so
"The" and "A" should be placed last after a comma.
Summary and Golden Rules
That's a lot of information, I know. Please bear in mind the following
key points:
1. Don't NAMECORRECT to add accents to names unless you have good evidence
that the accent is always (or usually) used. Grammatically based
name corrections will be rejected!
2. Use NAMECORRECT-list or NAMECORRECTAS-list to separate filmographies
when you can. Only use adds and deletes when absolutely necessary.
3. Don't use two-way naka as implicit namecorrects, don't send in group
cast naka for individuals and remember - they don't need numerals
New TGQ acceptance policy
-------------------------
by Tim Norris, co-manager, TGQ lists
During the past 12 months there have been a few changes to the way we
process Trivia, Goofs and Quotes (TGQ). The most obvious change was that
we acquired a new list manager and began to process them at all, working
our way through a large accumulated backlog (thanks to all regular TGQ
contributors for your patience). But there have been other changes as well.
With a new list manager, it's inevitable that there will be a new attitude
and new ideas, and we're slowly redefining our acceptance criteria for
items in the Goofs and Trivia lists. New guidelines are being written
and will be online later in the year, but for now we thought we might
forestall a certain amount of frustration and irritation among T&G
contributors by explaining one of the most important changes.
There's a strong case supporting the notion that IMDb's purpose is to
store all movie-related data and that *all* contributed goof and trivia
information, if true, should appear on the site. But there's a stronger
case for the idea that the T&G lists, designated as they are as "fun
stuff", need to be interesting and relevant as well as "true" (and not
included in other lists). Other lists can, and should, be thorough and
all-inclusive, but the T&G lists are more free-form and exist to inform,
astound and entertain. An unprocessed mind-dump of every single little
thing we receive would have no value - raw data of that sort is largely
useless - and editorial decisions are essential.
The problem, of course, is that it's difficult to empirically define
"interesting", but we do have a simple test which seems to work pretty
well when we've tried it around the team. Whenever we get an item that
falls somewhere in the shadowy no-man's-land between interesting and
ho-hum, we ask ourselves, "Would I be comfortable telling this to a
complete stranger at a party?" It's not infallible, and it has to be
weighted for fan-factor ("Would I be comfortable telling this to a
complete stranger at a convention?"), but as a starting point, it's
not bad. If you don't think your trivia nugget would impress one person
face-to-face, why share it with 13 million IMDb visitors?
As the backlogs clear, there will be more time available to evaluate the
information we already have, as well as to more critically examine the
new information that continues to come in, and we hope you'll see a
marked improvement in the quality of our T&G coverage as well as the
usual continued increase in quantity. We rely utterly on the excellent
stuff we get from you, and between us we can make the lists better than
ever. Over the course of the coming year it will become increasingly
difficult to get trivia and goofs listed, but it's not personal and it
shouldn't be a reason to give up - don't view this as a disincentive,
look at it as a challenge.
Processing cycles
-----------------
In the past, all data was collected into a weekly batch, and processed
on a weekly cycle. Everything sent by 11 PM Pacific time Wednesday would
be collected into a batch and presented to our list managers; they would
then update their lists over the next week, as their schedule permitted.
New names and titles were added earlier on the day Wednesday so data
associated with them could be included in the weekly batch.
As we have been updating our software that we use internally, many lists
have moved to a daily basis to improve turnaround time. Some of the
lists that have already moved to a daily cycle include trivia, goofs,
quotes, biographies, and the name and title URLs. However, this doesn't
guarantee that all data will be processed each day; where a backlog exists
(TGQ), the new data is still processed in priority order.
We have also moved to a partial daily cycle for new titles. The data
for new titles is still given to the list manager on a weekly basis,
but new titles are approved on a daily cycle. For now, the reference
files are still produced on a weekly cycle, but we plan to move those to
a daily cycle as well. For new titles that clearly meet the standards for
inclusion in the database, this should reduce the delay before they appear.
There is still a large backlog of titles with insufficient data to show
they qualify for inclusion.
Most other lists are still being processed on a weekly schedule, and are
generally processed within one week. The exceptions: If a new name must
be approved, that adds a week to the cycle. If a new company is sent,
the data may go into a backlog. Alternate names (naka) are on a one-week
delay. Besides the trivia, goof, and quotes lists, which have large
prioritized backlogs, the awards, awards-master, and DVD lists also have
large backlogs, and the laserdisc list is not currently being processed.
Composers of non-original music
-------------------------------
A recent change to our displays has highlighted some old but little-known
policies regarding composers of non-original music.
If a composer is credited for music used from another source (usually
classical music), and it's not appropriate to relegate them to the
soundtrack list, their composer credit should include the attribute
"(from ...)" indicating the work(s) of theirs that were used. Examples:
van Beethoven, Ludwig|Clockwork Orange, A (1971)|(from "9th symphony")
Strauss, Richard|2001: A Space Odyssey (1968)|(from "Also sprach Zarathustra")
If a composer's music is reused, and again it's appropriate to put them
in the composer list, but there's no appropriate "from", then the
attribute "(r)" should be added. This can be used when you know the works
of a classical composer were used, but don't know the title, or when music
from an earlier soundtrack was used. Example:
Bakaleinikoff, Mischa|Bank Alarm (1937)|(r) (stock music) (uncredited)
Berlin, Irving|Star Trek: Nemesis (2002)|(r) (song "Blue Skies")
However, the use with songs should be avoided; these should generally be
included in the soundtrack list. At some future date, the entries in the
soundtrack list will be more completely integrated into filmographies.
Policies on various lists
-------------------------
Color list: Entries for newer movies should generally not include any
attributes. Deluxe, CFI, FotoKem, and Technicolor are laboratories
(for newer movies) and should go to the technical list. Information
about film stock (35mm, video, etc) also belongs to the technical list.
Colorized films are not accepted in this list; they should be sent
to the alternate version list.
Certificates: Certificate entries are accepted for video games.
For USA titles, the current MPAA rating system was introduced in
1968. Before that time, movies were approved and issued a PCA
number. For newer movies, there is a certificate number that can be
sent as an attribute like "(certificate #34544)".
There is a difference between Unrated and Not Rated. Not Rated means
the movie has never been rated and Unrated means that there is a regular
rating for the movie, but this version was altered and therefore has no
rating, usually a director's cut on DVD with more sex or violence.
Please note that certificates should reflect only the actual rating from
official censorship/rating authorities, not expected ratings.
Countries, Languages: These lists do have sequence numbers that should
be used when possible, but there is no current support for them in the
web interface. Languages should not reflect dubbed languages, unless
that's the only format that was released. Our country list reflects
the country that existed at the time the film was produced; for example,
German films from 1949-1990 should specify East Germany or West Germany.
Sound Systems: In the early days of sound films, a number of different
recording systems (Westrex, RCA Sound System, Western Electric Sound
System) were used. Since all these systems produce an essentially
identical Mono soundtrack, we now list them as Mono|(system) -- for
example,
Rebecca (1940)|Mono|(Western Electric Sound System)
The process of cleaning up some of the existing entries is ongoing.
For newer movies, "Stereo" is usually not sufficient; instead, the
correct system (DTS, Dolby, Dolby Digital, etc.) should be used.
Running Times: For TV series, the running time per episode is used,
not the total running time. If a series is no longer in production,
the total number of episodes can be added as an attribute.
Biographies / comments & corrects: Please don't use the COMMENT and
CORRECT keywords to contribute data; use the BIOGRAPHY keyword. Also,
corrections to existing data should go only under CORRECT, not both.
2002: The year in review
------------------------
Once again, we've broken all our records over this past year. The volume
of contributions has grown from almost 120,000 items weekly to over
143,000. That includes an average of over 750 new titles approved each
week, along with a great deal of data on existing titles. We've begun
revising our tools to make it easier to process this increased volume,
along with the expected further increase once our new additions system
is launched. As noted above, many lists are now being processed on a
daily cycle, and more are on the way.
We've also been able to add many more photos, with galleries from major
events like the Academy Awards appearing overnight or even sooner.
Not only has the amount of data included and added to the database
continued to grow, so has the traffic. Our monthly unique visitor count
has grown from almost 10 million to over 13 million in the past year,
and our traffic has continued to set records, with several days over
12 million page views. The technology running the site is well on the
way through a complete revamp to allow us to handle this volume and the
make changes easier. The biggest recent change, of course, has been the
new message board system, which has been much more popular than the old
system (as measured by page views); this includes, for the first time,
message boards associated with each name page. Now that much of our
underlying technology has been replaced, we expect to introduce many
more new features in the coming year.
Correcting series years
-----------------------
The year range for a TV series often needs to be corrected, particularly
with current series as they are renewed or cancelled. Our preferred
way of doing this (for now) is via a TITLECORRECT, specifying the
new year range. For example:
TITLECORRECT
"Oz" (1997)|"Oz" (1997-2003)|
"NYPD Blue" (1993)|"NYPD Blue" (1993-????)|
This will not affect the actual title of the series; data for the above
series would continue to appear under "Oz" (1997) and "NYPD Blue" (1993).
This is just the most efficient way to communicate the year change to the
title editor, who has to make these changes by hand in any case.
If you are sending the title of a previously unknown series that has
already ended, you should include the years in the TITLE data as always:
TITLE
"I Love Lucy" (1951)|1951-1957|
Release date changes
--------------------
We recognize that release dates of new films change frequently. While
it's a good idea to provide a comment to any change that isn't quite
clear, the movement of a release date is usually clear and doesn't need
a comment. Exceptions include if you are changing a release date from
a specific date to a more vague date, or if you are providing an earlier
date in a foreign country than in the film's native country.
UNIX tools 3.19 released
------------------------
The moviedb package (a local UNIX version of the database) has again
been updated to correct some capacity problems. Version 3.19 was
released in late October, and is essential if you are using the
current data files. It can be found at the usual FTP sites;
see http://www.imdb.com/interfaces for details.
Installation remains the same as for earlier releases. Note that if
you are using the X Windows interface, xregal, it cannot be compiled
with most new releases of X. However, the changes in this release do
not require recompilation of xregal, so if you have a working binary
of xregal, keep using it. Alas, the author of xregal has chosen to
stop supporting it, so a newer version is not available.
To rebuild: Extract the tar file into a directory named database.
Assuming you already have a copy of the database files, from ./database/ :
make compile
make installbin
cd imoviedb; make; make install
# If you need to build xregal and are able to:
# cd ../xregal; make; make install
cd ..
make cleandbs
make update-local
./etc/cgencompl -all # optional
If it's not working for you, check the following things first:
. Do you have enough disk space?
. Are the source files for moviedb up to date?
. Are all the binaries in database/bin/ and database/etc/ up to date?
. Did you do *all* relevant steps above in the order listed?
For further support, contact unix@imdb.com.
Questions
---------
Q: I found a title that doesn't meet the criteria you mentioned last issue.
A: Yes, the criteria have been tightened over time, so it's likely some
titles already in the database lack enough data to prove their eligibility.
You can point this out to the title manager with COMMENT-TITLE if you are
sure it does not meet the standards. This includes announced projects
that were never made.
Q: Are release dates essential for old movies?
A: Not necessarily. The article last issue was aimed at new releases, as
they give people the most problems. For older movies, simply indicating a
major studio and well known cast or crew member will probably suffice.
However, release dates for older movies are generally pretty easy to find.
You should still include the year of release in the title, if possible.
Q: Do you accept music videos?
A: Music videos are currently accepted as titles only if they are released on
DVD / VHS (generally, these are compilations or sometimes longform videos).
The individual titles within such compilations are not accepted.
Music videos can be included in the "other works" section of biographical
information.
Q: How do I create a reference to a person or title in a trivia item?
A: Use the "qv-reference" format. For a person, enclose the name in
apostrophes; for a title, enclose it in underscores. Thus: 'John Doe' (qv)
appears in _Title (2002)_ (qv). Please be sure to use character names
instead of actor names wherever relevant. (The same format works in most
of the free-format text lists: goofs, soundtracks, biographies, etc.)
---------------------------------------------------------------------------
IMDb - Data Contributor's Newsletter - Issue 4 - THE END