Archive for October, 2002

Contributor Newsletter #3

In this issue
-------------
 - New message board
 - New keyword: LOCATIONCORRECT
 - Getting a new title accepted
 - New additions system in development
 - Cast order revisited
 - Update on trivia, goofs, and quotes
 - Short questions

New message board
-----------------
Since the last newsletter, a new message board system has launched on the
site.  Along with a number of new features, we have also added a special
message board for people who contribute data, called "Contributors Help."
This board is actively monitored by IMDb staff, who attempt to respond
as necessary (though we're gratified that our community of visitors
handles some of the common queries for us).  You can visit this new
message board at http://www.imdb.com/board/bd0000042/threads/ or just
click on the "Message Boards" tab at the top of any page and scroll down.

New keyword: LOCATIONCORRECT
----------------------------
In response to a comment on the message board, a new keyword has been
created: LOCATIONCORRECT.  This keyword is used to correct entries in
the location list where the location is misspelled, or does not fit
properly into the hierarchy at http://www.imdb.com/LocationTree .  It
should not be used if the entry for a movie is a valid real-world location,
just not where the movie was filmed.

To use the keyword, send something like this to the mail server:

LOCATIONCORRECT
wrong location|right location|

Thus:

LOCATIONCORRECT
Los Angeles, CA, USA|Los Angeles, California, USA|

You must supply the complete location for it to be corrected; thus, in the
example above, correcting just "CA, USA" would not work.

Getting a new title accepted
----------------------------
One of the more common questions we get from frustrated contributors is:
Why has my new title not been accepted?  As many of you know, there's
currently a processing delay of a few weeks, so sometimes the answer is
just, "be patient," but often the title has been rejected.

The most common reason for a title to be rejected is because it does not
meet our criteria for inclusion.  They are spelled out in our new title
guide at http://www.imdb.com/Guides/new-titles, but the short version is
that a new title must be of "general public interest" and must be/have
been available to the public.

This means that an independent film still in production made by people
with no track record is, generally, ineligible.  But many titles that
are eligible for listing do not get included because the data fails to
demonstrate that they meet the criteria.

If a film has been shown to the public, it's critical to include a
release date.  This can be approximate (e.g., month and year only),
but it should include any appropriate modifiers, like the name of the
film festival, the city for a single-city release, or "(limited)" for a
limited multi-city release.  If the film has a distributor, that should
be included; if it debuted on television, then the name of the television
network should be used as the distributor.  If it's available on video
through established channels, then again, the name of the distributor
and release date are important.  A Google search is often helpful if
it's been shown in festivals (try the title plus the director name).
While including an official URL is helpful, be aware that it's not a
substitute for including the distributor name (even if the URL is a page
within a distributor's web site).

If a film is still in production, it's eligible for listing if there's
a high probability it will be available to the public.  In general,
that means an established production company and/or distributor, or
well-known filmmakers/actors.  In this case, we ask for a certain amount
of information so we can properly track the film.  Films in production
often change title and the people working on them can change, so we need
more than just a title and a single name to allow us to track them.
In any case, if there's only one name connected with the title, it's
not far enough along in the process to have a high probability of being
made (ask anyone who's seen the mountain of unproduced screenplays at
most studios).  With the exception of a few high-profile projects that
have an excellent chance of being made (e.g., Star Wars Episode III),
we don't track films until they are solidly into the preproduction phase.

The volume of data we process means we generally ask the contributor to do
the research to show that a title qualifies; we add over 1000 new titles
to the database in some weeks, and hundreds more are discarded.  Many of
those rejected titles eventually do find their way into the database,
either because someone eventually sends them with enough relevant data,
or because they get accepted by a festival and thus become eligible.

Please note that the only data that counts is formatted data contributed
through our web interface; vague comments like "I hear he's making a
movie with so-and-so" are worthless to us.  However, if there's some
unusual reason why a movie should be included despite not meeting the
normal criteria, please do include a comment explaining why along with
the other data (which should still include as much formatted data
as possible).  That comment should be included as a COMMENT-TITLE.
One example that comes to mind: George Lucas in Love, which was widely
available to Hollywood insiders and was written about in several major
publications before it was accepted by a festival.

New additions system in development
-----------------------------------
As has been mentioned on the message board, a replacement for the current
additions system is in development.  While it is still some weeks away,
some of the main features can be discussed now.

In the existing additions system, the web interface serves only to format
data for the mail interface; all data goes through the mail interface.
In the new system, the web interface will become the primary means of
contributing data.  This means that many of the more mysterious rejection
messages will no longer appear, as there will no longer be a disconnect
between what the web interface accepts and what the mail server accepts.

An interface will be available for bulk contributors, but it will feed
into the web interface.  Rather than the current mail interface, you
will upload a file and get immediate feedback.

As there are a number of things that can be correctly sent through the
mail interface today that cannot be created with the web interface,
this will mean a complete reworking of the web interface to allow all
valid data to be sent.  This will also mean errors are detected closer to
their source, which should allow us to give more helpful error messages.
The web interface will probably be deployed on a section-by-section basis;
the first section is currently projected to be release dates.

Existing data will be much easier to correct with the new system as well,
with easy-to-use forms instead of the current complicated process.

One of the best features of the new system will be the ability to see what
data you've sent that is still pending and its current status.  You will
also be able to modify or add to data you've sent that is still pending.
Also, in our long range plans, you'll be able to see and comment on data
from other contributors that is pending.

We expect that the web interface will feed more directly into our
back-end processing tools, which should improve the turnaround time for
processing data.  Among other things, it means data will no longer be
held in weekly batches, but will become available to list managers on
a daily, or even continuous, basis.  (This doesn't necessarily mean it
will be processed on a daily basis; some list management may still be
handled on a weekly cycle for a while.)

As the new additions system progresses, we may ask for volunteer beta
testers, most likely on the message boards.

Cast order revisited
--------------------
It's been pointed out that the article in the last issue on cast ordering
was a bit over simplified.  To reiterate that article: The cast list order
is determined by the most comprehensive cast list.  In modern films, that
list is usually in the closing credits.

However, some productions (notably TV movies) split the cast list, with
the major stars listed in the opening credits and only the supporting
players listed in the closing credits.  In such cases, the cast list
should be treated as a single cast list, interrupted by the movie itself.

Update on trivia, goofs, and quotes
-----------------------------------
We're pleased to announce that the backlog of trivia, goofs, and quotes
for the 1000 most popular titles in the database (as determined by page
view) has been cleared.  We expect to keep up with new contributions
for the top 1000 titles, and to make progress on the backlog, where we
will continue to focus on the most popular remaining titles.  We have
also begun processing comments and corrections for the top 1000 titles.

Questions
---------
Q: Why don't you list movie-links for TV series (e.g., references made by
the series)?

A: Particularly for a long-running series, the list would become unwieldy.
When we support episode titles, we will reconsider this.

Q: Now that episode lists have been split into separate entries, does the
earlier limit of 5 episodes per person still hold?

A: No.

Q: What's the dividing line for a Short versus a feature?

A: 40 minutes, though we allow a few minutes slack as some sources don't
consider credits in their timings.

Q: What's the difference between a miniseries and a TV movie?

A: Anything over 240 minutes running time (excluding commercials) is a
miniseries; anything shorter is a TV movie, even if it's broadcast in
multiple installments.

Q: Now that you accept animal credits, you should also indicate the type
of animal.

A: Good idea.  We'll think about adding a special place for this; in the
meantime, please submit an appropriate biography trivia entry.

Q: Where should the titles of individual episodes of theatrical serials
be placed?

A: For now, send them as trivia entries for the serial title.

Q: Are quotes accepted in any language other than English?

A: No.
---------------------------------------------------------------------------
IMDb - Data Contributor's Newsletter - Issue 3 - THE END

No Comments

Contributor Newsletter #2

This is the IMDb contributor's newsletter, published every 6-8 weeks.
To unsubscribe, send a message to data-news-unsubscribe@mlists.imdb.com.
To subscribe, send a message to data-news-subscribe@mlists.imdb.com.
You can also use the signup page at http://www.imdb.com/maillists .

Feedback on these articles or suggestions for new topics are welcome;
contact dnews@imdb.com.  The most interesting questions will be used
in the next issue.

Issue #2

In this issue
-------------
 - What happens when you submit data
 - Cast credit order
 - Historical figures in cast lists
 - Episode lists
 - Some comments about AKA titles
 - UNIX tools 3.18 released
 - Feedback

What happens when you submit data
---------------------------------
People sometimes wonder what happens to their data after they submit it.
It is not placed online immediately.  Once the mail server has accepted
your data, it is accumulated until about 8 AM GMT Thursday (11 PM PST
Wednesday).  The entire week's data is sent to the managers of the
various portions of the database.  Each list manager then extracts
the data for the parts of the database they are responsible for.
The data is sorted and duplicates are eliminated.  The list managers
spend some time making sure the data is formatted correctly and checking
for various inconsistencies, such as people working before their birth
or after their death (this may indicate two people with the same name,
but not necessarily).  Some data is checked against official sources.
As the various database managers complete work on a list, they upload
their information; the database is rebuilt nightly using whatever's been
added that day.  Some browsable sections of the database are rebuilt on
a weekly cycle.

That's the normal cycle.  However, when you add a new title, it has to
go through additional processing.  Because people often submit titles
that are not really new, or are not appropriate for inclusion, each
title must be examined and approved manually, based in part on the data
submitted along with the title, which is why it's important to submit as
much information as possible along with a new title.  This currently adds
two to four weeks to the cycle; data will not appear online until the
title it is associated with has been approved.  In addition, new names
must also be approved for similar reasons; this adds about a week delay.
If data is submitted to the wrong list (e.g., a casting assistant,
which belongs in the miscellaneous crew list, submitted to the casting
directors list), rerouting it adds another week or two.

While a title or name is awaiting approval, the data is kept to one side.
After the title/name is approved, the data is normally included the next
time the list is processed, which means it should appear within a week.
Unfortunately, it does sometimes get lost if there is an unusually
long delay or other problems; we are working to reduce the number of
these cases.

For certain kinds of data, additional work is needed.  Submissions of
URLs for new sites are verified to be sure the site meets our guidelines
of appropriateness (for example, sites submitted for a title must pertain
to that specific title, not a company or actor).  Finally, those lists
with free-form text need manual copy editing for wording and duplicates.
This takes varying amounts of times for the various lists, based on
submission volumes and quality, along with the backlog for those lists
(see the article last issue about the "TGQ" lists).  A reminder that
the TGQ backlog is processed in priority order; we've made excellent
progress in the last 2 months.

For reasons of timeliness, some information provided by IMDb staff
bypasses part of this process.  Most notably, editors collect box office
data and links to reviews at some web sites; these are updated in the
nightly build mentioned earlier.  We also update biographies when someone
notable dies, and information for certain high-profile awards will also
appear online much faster.  On the IMDbPro site, some of this information
doesn't even have to wait for a nightly build.

Over the next year, we hope to streamline the submission process,
eliminating weekly batching and making changes that should reduce the
number of bad title submissions.  There will also be opportunities to
see and comment on data that has been submitted but not processed.

This process has already begun; for example, URLs are processed daily,
not weekly.

Cast credit order
-----------------
The cast of a film is one of two sections of the database that does
not necessarily appear in alphabetical order (the other is the writing
credits).  The rule for determining this order can be confusing, since
we don't necessarily list the biggest stars first.

The rule is this: the correct order for credits is that of the most
comprehensive cast list, which in modern films is usually at the end.
If that leaves the stars way down in the list, so be it.

We do have another system for marking principal cast members that we have
not yet fully deployed; that will allow us to feature those actors on
the overview page regardless of cast order.  We also expect to some day
flag whether the cast is billed alphabetically or in order of appearance
(the two most common counter-billing orders).

Historical figures in cast lists
--------------------------------
We have many appearances for historical characters playing themselves
(e.g. Richard Nixon).  It's very hard to draw the line here because in
some cases those credits are valid and useful to have.  Keeping Nixon
as an example, some cases where the 'credit' is valid:

# "Cold War" (1998) (mini)
# Reel Radicals: The Sixties Revolution in Film (2002) (TV)
# Making of a Leader (1919-1968), The (1994) (TV)
# Houston, We've Got a Problem (1994)
# Secret Life of Richard Nixon, The (2000) (TV)

In other cases the credit is superfluous and should go.  For example:

# Frequency (2000)
# Contact (1997)
# Doors, The (1991)

Even though footage of him was used in those films (and we mark his
appearance as 'archive footage'), these appearances do not belong in
the main cast list.

The problem is that in many cases it's hard or impossible to make the
distinction unless you are familiar with the film.  For example, when
you see a credit like "Watergate" (1994) (mini), you don't really know
if this is a legitimate documentary appearance or some fictional based
on fact program that uses footage of Nixon the same way Contact (1997)
or JFK (1992) do.

In some cases it can be determined by checking the data we have on the
title (whether it's a docu, whether all other credits are for professional
actors or historical figures etc.) but that requires tools/time/effort
that we don't have right now.

We do reject many similar credits (many appearances by Hitler, Bill
Clinton, JFK or other historical figures are rejected every week).
The ones that are listed in the database managed to creep into the lists.
Not all submitters share the view that we should reject those credits;
some see them online, assume that this is the norm, and send more of them.
All this makes it harder to reject them, especially if we am not familiar
with the titles involved.

At this time we are erring on the side of accepting the credits when
in doubt, and possibly removing them later when they are determined to
be invalid.

At some future time, we may create another way of listing such appearances
that would clearly separate them from the main cast list.

Episode lists
-------------
As the coverage of television episodes has grown, some crew members have
accumulated episode lists that have become unmanageably long.  A more
comprehensive solution to episodes is in the works, but until it arrives,
we are using another approach.  Where in the past you may have submitted
multiple episodes in a single entry, like this:

Spotnitz, Frank|"X Files, The" (1993)|(episodes "Alone (2001)", "Daemonicus (2001)")

you should now submit each episode separately, like this:

Spotnitz, Frank|"X Files, The" (1993)|(episode "Alone (2001)")
Spotnitz, Frank|"X Files, The" (1993)|(episode "Daemonicus (2001)")

Existing entries are being converted.  In some cases, episode lists
were temporarily replaced with "(multiple episodes)"; these should be
converted back shortly as well.

Some comments about AKA titles
------------------------------

After the last newsletter, we got some feedback about aka titles
(alternative titles) in IMDb.  This is a response to those remarks.

We mentioned in the last issue that IMDbPro displays USA titles
where available.  This includes only those titles marked (USA)
with no additional attributes like (informal English title).  Thus,
a title that is only a translation used in a review and not an actual
release title should be marked appropriately; other possibilities are
(informal literal English title) and (video title).  Unfortunately,
we add about 1000 aka titles each week, and are unable to investigate
each one in depth.  It's thus more important than ever to be sure to use
the correct attributes on alternative titles.  If a title should have an
attribute and does not, please use CORRECT-AKA to point out the omission.
The attribute (theatrical title) should only be used for, and is only
present on, TV movies, mini-series, and video titles that would not
normally get a theatrical release.

There are many alternate titles with no attribute from the time before
we attached attributes to alternate titles; again, if you know what the
correct attribute should be, please report it with a CORRECT-AKA.

The year in an aka title should correspond to the year that title was
used.  Our tools will, by default, force the year in the aka to match
the year in the primary title.  However, if the aka title specifies
a country and we have a release date in that country with a different
year, the year in the aka title will be corrected to match.  Years in
less structured lists, such as distributor attributes, cannot be used
for this purpose; the only release years that really matter are those
in the release date list.

Finally, we recognize that aka titles for languages using non-Roman
alphabets are not always consistent.  While our title manager is fluent in
four languages, there are many languages where his knowledge is minimal
to zero.  Correcting transliterations requires detailed knowledge about
the original language, character set and transliteration rules.  We do
not have this knowledge for Japanese, Russian, Indian languages etc.
We depend on the knowledge of our users here.  The usual ways of
correcting data applies here as well.  There is no satisfying solution
to this problem as long as no experts are available that basically debug
the complete set of titles for one language and enforce standards to be
used on every single title.

UNIX tools 3.18 released
------------------------
Version 3.18 of the locally installed version of the database package
(moviedb) has been released.  It can be found at the usual FTP sites;
see http://us.imdb.com/interfaces for details.

There is one major change in this release.  The previous versions
could only handle 60,000 titles with votes; that limit has been removed
in this version.  In addition, various compile-time warnings should
no longer occur.

Installation remains the same as for earlier releases.  Note that if
you are using the X Windows interface, xregal, it cannot be compiled
with most new releases of X.  However, the changes in this release do
not require recompilation of xregal, so if you have a working binary
of xregal, keep using it.  Alas, the author of xregal has chosen to
stop supporting it, so a newer version is not available.

To rebuild:  Extract the tar file into a directory named database.
Assuming you already have a copy of the database files, from ./database/ :

make compile
make installbin
cd imoviedb; make; make install
# If you need to build xregal and are able to:
# cd ../xregal; make; make install
cd ..
make cleandbs
make update-local
./etc/cgencompl -all # optional

If it's not working for you, check the following things first:

 . Do you have enough disk space?
 . Are the source files for moviedb up to date?
 . Are all the binaries in database/bin/ and database/etc/ up to date?
 . Did you do *all* relevant steps above in the order listed?

For further support, contact unix@imdb.com.

Feedback
--------
Thanks to the people that commented on the first issue of the newsletter.
By far the most popular questions centered around our processing cycle,
which is why the lead article in this issue is an overview of that cycle.
Another popular question had to do with the proper method of correcting
names; a major article on that subject is planned for the next issue.
Many of the other articles in this issue were also inspired by user
questions.

Some other questions (summarized):

Q: Don't goofs take a lot of time to check?  Are people really that
interested?

A: They actually take more time to edit for readability and check for
duplicates than to research, but yes, our logs show that goofs are a
very popular part of the database.

Q: Do you accept submissions for animal performers?

A: We've recently changed our policy on this.  If an animal performer is
credited in the cast list, they can now be submitted with the regular cast
list.  If an animal performer is uncredited, or if their credit is buried
in the miscellaneous credits of a movie, then we do not accept them.
You should make your best guess of the actual gender of the animal when
determining whether to submit it to the actor or actress list.

Q: Why hasn't my miscellaneous crew submission appeared?

A: Backlog on this list was running about 4 weeks.  This has recently been
cleared and is now back to normal.

---------------------------------------------------------------------------
IMDb - Data Contributor's Newsletter - Issue 2 - THE END

No Comments

Twitter links powered by Tweet This v1.8.2, a WordPress plugin for Twitter.