Difference between revisions of "Darwin:Push Port"

From Open Rail Data Wiki
Jump to navigation Jump to search
(34 intermediate revisions by 10 users not shown)
Line 1: Line 1:
== About ==
+
The Darwin Push Port is an XML push feed that continuously streams information about the creation of,
 +
and changes to, train schedule records, together with train running predictions made by Darwin.
  
The Darwin Push Port is an XML push feed that continuously streams information about the creation of, and changes to, train schedule records, together with train running predictions made by Darwin.
+
The data is made available through http://opendata.nationalrail.co.uk.
  
The Push Port can be filtered to a specific area of interest by TIPLOC, or to provide information for the entire country. Either way, the information delivered is complex and must be properly interpreted before presentation to end users.
+
The Push Port requires the user to build a database capable of capturing extremely high volumes of  
 +
information, as well as a query engine to draw the information from your database. There is a large
 +
amount of interpretation work involved in this; however this allows substantial flexibility to apply
 +
the information to any product within the limitations of your own infrastructure.
  
The Push Port requires the user to build a database capable of capturing extremely high volumes of information, as well as a query engine to draw the information from your database. There is a large amount of interpretation work involved in this; however this allows substantial flexibility to apply the information to any product within the limitations of your own infrastructure.
+
= Data =
 +
The Push Port has two components:
 +
* Timetable and Timetable Reference Data.
 +
* Real-Time Update Data.
  
== Availability ==
+
All Darwin data is gzipped ''(except for the Darwin Status Topic)''.  [[Darwin:Push_Port_XML_Schemas|XSDs]] for the interface are available, along with the specification.
  
The Darwin Push Port feed is expected to be available by 31st March 2015.
+
= Timetable and Timetable Reference Data =
 +
Darwin makes available Timetable and Timetable Reference Data exposed as static files that are
 +
generated ''usually'' on a daily basis.  The creation of new Timetable and Timetable Reference files
 +
are alerted via ''TimeTableId'' messages in the real-time Update Data.
  
== Data ==
+
== Timetables ==
 +
Timetable data contains a set of schedules covering at least a 48-hour
 +
period held in the Darwin database. This list of schedules provides the basis on
 +
which a Darwin snapshot can be applied.
  
The Push Port has three components:
+
The schedules in the timetable do not include forecast or actual times although
 +
they reflect the latest state that Darwin has when the timetable file was generated,
 +
so any schedule changes, new schedules, false destinations, cancellations and
 +
associations will be included.
  
* Reference Data, available over FTP once a day
+
== Reference Data ==
* Timetable, available over FTP once a day
+
The Timetable Reference Data contains the following data referenced in timetables:
* Real-time updates, available over Stomp
+
* TIPLOCs, CRS codes, TOC codes and location names
 +
* TOC codes, names and website URLs
 +
* [[Darwin:Late Running reason codes and text|Late Running reason codes and text]]
 +
* [[Darwin:Cancellation reason codes and text|Cancellation reason codes and text]]
 +
* [[Darwin:Via_Locations|Via locations]]
 +
* [[Darwin:CIS codes and names|CIS codes and names]]
 +
 
 +
= Update Data =
 +
Darwin makes available real-time updates that alert the user to changes in the state of the
 +
Darwin database, or the creation of new Timetable and Timetable Reference Data.  Darwin exposes two
 +
message topics:
 +
* Darwin Live Feed Topic
 +
* Darwin Status Topic
 +
 
 +
== Darwin Live Feed Topic ==
 +
The live feed topic exposes all update messages.  Update Messages contain one or more of the following elements:
 +
* [[Darwin:Schedule_Element|Schedule data]]
 +
* [[Darwin:Association_Element|Association data]]
 +
* [[Darwin:Train_Status_Element|Actual and Forecast data]]
 +
* [[Darwin:Train_Order_Element|Train order data]]
 +
* [[Darwin:Station_Message_Element|Station Messages]]
 +
* [[Darwin:Train_Alert_data|Train Alerts]]
 +
* [[Darwin:Tracking_ID_corrections|Tracking ID corrections]]
 +
* [[Darwin:Alarm_Element|Alarms]]
 +
* [[Darwin:Formations|Schedule formation]]
 +
* [[Darwin:Train_Loading|Loading]]
 +
 
 +
The Live Feed Topic also exposes ''TimeTableId'' messages that alert the creation of a new Timetable
 +
or Timetable Reference file.
 +
 
 +
== Status Messages ==
 +
The Status message topic contains status messages about the health and state of the Update Data.
 +
The possible messages are:
 +
; HBINIT : The upstream live feed is running but is initialising its timetable.
 +
   
 +
; HBFAIL : The upstream live feed is shutting down.
 +
   
 +
; HBPENDING : The upstream live feed is operating, but part of the system is currently in failover mode. Data may be queued for a short period. Clients may remain connected and data will be delivered when available.
 +
   
 +
; SNAPSHOT : The Darwin Live Feed has encountered a discontinuity of messages from upstream and is starting a snapshot to re-sync it's state.
 +
   
 +
; SHUTTING-DOWN : Darwin is shutting down and the message topics will soon become unavailable.
 +
 
 +
= Usage =
 +
 
 +
== Subscribing to Darwin ==
 +
The Darwin Push Port is made available through http://opendata.nationalrail.co.uk.  By creating an account, you can register
 +
for a subscription to the Darwin feed.
 +
 
 +
As a user with an active Darwin subscription, navigating to the My Feeds page will display the following details:
 +
 
 +
; Darwin File Information : This section provides user details for accessing the Timetable and Timetable Reference Data via an Amazon S3 Bucket.
 +
 
 +
; Darwin FTP Information : This section provides user details for accessing snapshots and 5-minute logs of the real-time Update Data via FTP.
 +
 
 +
; Darwin Topic Information : This section provides user details for accessing real-time Update Data via OpenWire and STOMP message topics.
 +
 
 +
'''Important''' - Please note NRDP accounts expire after extended periods of no use. The unused account expiry period is
 +
currently set to 30 days. If you create an account and do not consume any of the feeds during this time your account
 +
will be deleted. If your account has been deleted, you will receive a notification email, and you will be able to
 +
re-register for a new account.
  
[[Darwin:Push_Port_XML_Schemas|XSDs]] for the interface are available.
+
== How do I consume the data? ==
  
 +
=== Timetable and Reference Data ===
 +
Timetable and Reference data can be obtained via an Amazon S3 Bucket.  You will be required to connect and authenticate to S3 via the details given in ''Darwin File Information'' on your ''My Feeds'' page.
  
== Reference Data ==
+
=== Keeping up to date ===
 +
Timetable and Reference Data is updated ''usually'' on a daily basis.  To indicate that a new Timetable or Timetable Reference
 +
file is available, the real-time topic will send a ''TimeTableId'' message, to identify the new Timetable or Timetable Reference Data file name.
  
The reference data contains:
+
A separate ''TimeTableId'' message will be sent for each individual Timetable or Reference
 +
Data file that becomes available. Thus, multiple ''TimeTableId'' messages will be generated
 +
in succession, one for each Timetable and Reference file schema version.
  
* TIPLOCs, CRS codes, TOC codes and location names
+
''Note that due to existing schema limitations, the TimeTableId message has mandatory
* TOC codes, names and website URLs
+
attributes for timetable file and timetable reference data file names. Since the
* Late Running reason codes and text
+
TimeTableId notification message is only reporting the presence of a single file, only
* Cancellation reason codes and text
+
one of these attributes will be populated with a valid file name. The other attribute will
* [[Darwin:Via_Locations|Via locations]]
+
consist only of white space.''
* CIS codes and names
 
  
== Timetable ==
+
=== Update Data via FTP ===
 +
The FTP server provides non real-time Update Data for users that missed the real-time updates.  All files are gzipped.
  
There are two components to the Darwin Timetable:
+
Darwin regularly creates ''Snapshot'' files, containing the entire state of Darwin at a given point in time.  The latest snapshot file is available over FTP for end users.
  
* A timetable snapshot, available once a day via FTP
+
Every 5 minutes of Live Feed Data since the last snapshot will be available in log files, and available over FTP.
* Schedule updates via the Real-time feed (see below)
 
  
== Real-time Updates ==
+
=== Real-Time Update Data via OpenWire & Stomp Message Topics ===
 +
The Darwin Live Feed Topic and Darwin Status Messages Topic are exposed via ActiveMQ, and can be connected to
 +
via [http://activemq.apache.org/openwire.html OpenWire] or [http://en.wikipedia.org/wiki/Streaming_Text_Oriented_Messaging_Protocol STOMP]. The credentials for connecting can be obtained via your ''Darwin Topic Information'' section on your ''My Feeds'' page.
  
NOTE: All messages in the real-time update stream are sent in XML with the message body compressed using gzip.
+
STOMP and OpenWire allow durable and non-durable subscriptions.  If you would like Darwin to retain messages for you on disconnection,
 +
you should use a durable subscription.  Please note that message retention is limited, and is implemented to allow for short term
 +
subscriber failure, not long term message persistence.
  
Each message in the update feed contains [[Darwin:uR_Element|update response]] element, which is in turn nested inside a 'Pport' element:
+
'''Important''':  The following must be true when connecting to a Darwin Topic:
 +
* Watching Advisory Topics must be turned off.
 +
* If you are using a Durable Subscriber, your Client ID '''must''' begin with your username.
  
* [[Darwin:Schedule_Element|Schedule data]]
+
=== Detecting Real-Time Discontinuity ===
* [[Darwin:Association_Element|Association data]]
+
Each Update message contains a ''SequenceNumber'' header.  The sequence number runs from 0 to 9,999,999.
* [[Darwin:Train_Status_Element|Actual and Forecast data]]
+
Upon reaching the end of this range the sequence number wraps around to 0.
* Train Order data
 
* [[Darwin:Station_Message_Element|Station Messages]]
 
* Train Alert data
 
* Tracking ID corrections
 
* [[Darwin:Alarm_Element|Alarms]]
 
  
== How do I get the data? ==
+
NRDP guarantees messages are produced with sequential sequence numbers, therefore a missing sequence number
To access the data feeds you must register for an account at [https://datafeeds.nationalrail.co.uk]. Once you have submitted the registration form, you should receive a confirmation email. Follow the instructions in the email and set your new password.
+
indicates a missed message.
  
Once you have set your password and signed in, navigate to the data feeds screen by clicking on the My Feeds link at the top of the page On this screen you can find your personal queue name along with the username and password required to connect to the message queue, as well as the FTP site details and credentials.
+
For example, if you received the following sequence numbers in order:
 
== Accessing the data feeds ==
 
The Push Port feeds are accessed through the [http://en.wikipedia.org/wiki/Streaming_Text_Oriented_Messaging_Protocol STOMP protocol] from the National Rail [http://activemq.apache.org ActiveMQ message server].  A STOMP client subscribes to a queue that is unique to each registered user, and is then sent messages via the queue.  You will need a STOMP client (available in most languages - a list is available [http://stomp.github.io/implementations.html here]) or an [http://activemq.apache.org/openwire.html OpenWire client].  The Push Port messages are compressed using gzip, then queued and sent to any connected subscribers.
 
  
Real-time feeds can be made durable, so messages are not lost if a client experiences a brief disconnection.  See the page on [[Durable_Subscription|durable subscriptions]] and the code examples below for more information.
+
    0, 1, 2, 4, 5, 6
  
Each user's queue maintains 5 minutes worth of Push Port messages.
+
Then you have missed the message with sequence number 3.
  
== Accessing the Darwin timetable, reference data, and snapshot files ==
+
=== Filtering ===
Darwin releases the timetable and reference data file once a day. These files along with a snapshot file that is taken shortly after the timetable has been downloaded are made available for download via an FTP site. All Push Port messages that are older than 5 minutes are also placed in this FTP site in log files, with each log file containing 5 minutes of Push Port messages. It should be noted that the messages in the log files in the FTP site are not gzipped.
+
If you wish to, you may filter the Darwin Live Feed by message type using JMS Selectors on the MessageType header. Available message types and their respective
 +
codes include:
 +
{| class="wikitable"
 +
!Description
 +
!Code
 +
|-
 +
|Schedule updates (consisting of Schedule, DeactivatedSchedule)
 +
|SC
 +
|-
 +
|Association updates
 +
|AS
 +
|-
 +
|Schedule formations
 +
|SF
 +
|-
 +
|Train order
 +
|TO
 +
|-
 +
|Actual and Forecast Information
 +
|TS
 +
|-
 +
|Loading
 +
|LO
 +
|-
 +
|Station messages
 +
|OW
 +
|-
 +
|Notifications (consisting of TrainAlert, TrackingID, RTTIAlarm)
 +
|NO
 +
|}
 +
Please note that if you choose to filter messages, you will not be able to detect discontinuities in the Darwin feed.
  
The FTP site's URL and access credentials are displayed in the My Feeds screen.
+
= Good Practice =
 +
You should follow the [[Good_Practice|good practice guide]] when using this service.
  
== Good Practice ==
+
= Examples =
Read about good practice when using this service [http://wiki.openraildata.com/index.php/Good_Practice here].
+
Code examples for STOMP clients are available in [https://github.com/openraildata Github].
  
== Examples ==
+
The advanced usage page contains examples of some advanced applications for the data feeds, including bridging the ActiveMQ
Code examples for various languages are available from the [[Example_Code | Example Code]] page.
+
feeds to your own messaging server.
  
The [[Advanced Uses|advanced usage]] page contains examples of some advanced applications for the data feeds, including bridging the ActiveMQ feeds to your own messaging server.
+
= Version 12 Support =
  
== Support ==
+
Push Port v12 is no longer available as of mid-May 2019.
  
 +
= Support =
 
If you are having problems with the feeds:
 
If you are having problems with the feeds:
 
 
* First, read this wiki - there's a lot of material here that will help you
 
* First, read this wiki - there's a lot of material here that will help you
 +
* Check [https://twitter.com/open_rail_feeds twitter] to see if an issue has been reported
 
* If you want to discuss your problem with other people working with the service, the [https://groups.google.com/d/forum/openraildata-talk openraildata-talk] group on Google Groups will be useful
 
* If you want to discuss your problem with other people working with the service, the [https://groups.google.com/d/forum/openraildata-talk openraildata-talk] group on Google Groups will be useful
* Finally, if you're still having a problem, email [mailto:nrod.support@rockshore.net nrod.support@rockshore.net]
+
* Finally, if you're still having a problem, email [mailto:dsg_nrdp.support@caci.co.uk dsg_nrdp.support@caci.co.uk]
 +
 
 +
{{Navtable-NreDataFeeds}}
 +
 
 +
[[Category:National Rail Enquiries Data Feeds]]

Revision as of 19:51, 5 November 2019

The Darwin Push Port is an XML push feed that continuously streams information about the creation of, and changes to, train schedule records, together with train running predictions made by Darwin.

The data is made available through http://opendata.nationalrail.co.uk.

The Push Port requires the user to build a database capable of capturing extremely high volumes of information, as well as a query engine to draw the information from your database. There is a large amount of interpretation work involved in this; however this allows substantial flexibility to apply the information to any product within the limitations of your own infrastructure.

Data

The Push Port has two components:

  • Timetable and Timetable Reference Data.
  • Real-Time Update Data.

All Darwin data is gzipped (except for the Darwin Status Topic). XSDs for the interface are available, along with the specification.

Timetable and Timetable Reference Data

Darwin makes available Timetable and Timetable Reference Data exposed as static files that are generated usually on a daily basis. The creation of new Timetable and Timetable Reference files are alerted via TimeTableId messages in the real-time Update Data.

Timetables

Timetable data contains a set of schedules covering at least a 48-hour period held in the Darwin database. This list of schedules provides the basis on which a Darwin snapshot can be applied.

The schedules in the timetable do not include forecast or actual times although they reflect the latest state that Darwin has when the timetable file was generated, so any schedule changes, new schedules, false destinations, cancellations and associations will be included.

Reference Data

The Timetable Reference Data contains the following data referenced in timetables:

Update Data

Darwin makes available real-time updates that alert the user to changes in the state of the Darwin database, or the creation of new Timetable and Timetable Reference Data. Darwin exposes two message topics:

  • Darwin Live Feed Topic
  • Darwin Status Topic

Darwin Live Feed Topic

The live feed topic exposes all update messages. Update Messages contain one or more of the following elements:

The Live Feed Topic also exposes TimeTableId messages that alert the creation of a new Timetable or Timetable Reference file.

Status Messages

The Status message topic contains status messages about the health and state of the Update Data. The possible messages are:

HBINIT 
The upstream live feed is running but is initialising its timetable.
HBFAIL 
The upstream live feed is shutting down.
HBPENDING 
The upstream live feed is operating, but part of the system is currently in failover mode. Data may be queued for a short period. Clients may remain connected and data will be delivered when available.
SNAPSHOT 
The Darwin Live Feed has encountered a discontinuity of messages from upstream and is starting a snapshot to re-sync it's state.
SHUTTING-DOWN 
Darwin is shutting down and the message topics will soon become unavailable.

Usage

Subscribing to Darwin

The Darwin Push Port is made available through http://opendata.nationalrail.co.uk. By creating an account, you can register for a subscription to the Darwin feed.

As a user with an active Darwin subscription, navigating to the My Feeds page will display the following details:

Darwin File Information 
This section provides user details for accessing the Timetable and Timetable Reference Data via an Amazon S3 Bucket.
Darwin FTP Information 
This section provides user details for accessing snapshots and 5-minute logs of the real-time Update Data via FTP.
Darwin Topic Information 
This section provides user details for accessing real-time Update Data via OpenWire and STOMP message topics.

Important - Please note NRDP accounts expire after extended periods of no use. The unused account expiry period is currently set to 30 days. If you create an account and do not consume any of the feeds during this time your account will be deleted. If your account has been deleted, you will receive a notification email, and you will be able to re-register for a new account.

How do I consume the data?

Timetable and Reference Data

Timetable and Reference data can be obtained via an Amazon S3 Bucket. You will be required to connect and authenticate to S3 via the details given in Darwin File Information on your My Feeds page.

Keeping up to date

Timetable and Reference Data is updated usually on a daily basis. To indicate that a new Timetable or Timetable Reference file is available, the real-time topic will send a TimeTableId message, to identify the new Timetable or Timetable Reference Data file name.

A separate TimeTableId message will be sent for each individual Timetable or Reference Data file that becomes available. Thus, multiple TimeTableId messages will be generated in succession, one for each Timetable and Reference file schema version.

Note that due to existing schema limitations, the TimeTableId message has mandatory attributes for timetable file and timetable reference data file names. Since the TimeTableId notification message is only reporting the presence of a single file, only one of these attributes will be populated with a valid file name. The other attribute will consist only of white space.

Update Data via FTP

The FTP server provides non real-time Update Data for users that missed the real-time updates. All files are gzipped.

Darwin regularly creates Snapshot files, containing the entire state of Darwin at a given point in time. The latest snapshot file is available over FTP for end users.

Every 5 minutes of Live Feed Data since the last snapshot will be available in log files, and available over FTP.

Real-Time Update Data via OpenWire & Stomp Message Topics

The Darwin Live Feed Topic and Darwin Status Messages Topic are exposed via ActiveMQ, and can be connected to via OpenWire or STOMP. The credentials for connecting can be obtained via your Darwin Topic Information section on your My Feeds page.

STOMP and OpenWire allow durable and non-durable subscriptions. If you would like Darwin to retain messages for you on disconnection, you should use a durable subscription. Please note that message retention is limited, and is implemented to allow for short term subscriber failure, not long term message persistence.

Important: The following must be true when connecting to a Darwin Topic:

  • Watching Advisory Topics must be turned off.
  • If you are using a Durable Subscriber, your Client ID must begin with your username.

Detecting Real-Time Discontinuity

Each Update message contains a SequenceNumber header. The sequence number runs from 0 to 9,999,999. Upon reaching the end of this range the sequence number wraps around to 0.

NRDP guarantees messages are produced with sequential sequence numbers, therefore a missing sequence number indicates a missed message.

For example, if you received the following sequence numbers in order:

   0, 1, 2, 4, 5, 6

Then you have missed the message with sequence number 3.

Filtering

If you wish to, you may filter the Darwin Live Feed by message type using JMS Selectors on the MessageType header. Available message types and their respective codes include:

Description Code
Schedule updates (consisting of Schedule, DeactivatedSchedule) SC
Association updates AS
Schedule formations SF
Train order TO
Actual and Forecast Information TS
Loading LO
Station messages OW
Notifications (consisting of TrainAlert, TrackingID, RTTIAlarm) NO

Please note that if you choose to filter messages, you will not be able to detect discontinuities in the Darwin feed.

Good Practice

You should follow the good practice guide when using this service.

Examples

Code examples for STOMP clients are available in Github.

The advanced usage page contains examples of some advanced applications for the data feeds, including bridging the ActiveMQ feeds to your own messaging server.

Version 12 Support

Push Port v12 is no longer available as of mid-May 2019.

Support

If you are having problems with the feeds:

  • First, read this wiki - there's a lot of material here that will help you
  • Check twitter to see if an issue has been reported
  • If you want to discuss your problem with other people working with the service, the openraildata-talk group on Google Groups will be useful
  • Finally, if you're still having a problem, email dsg_nrdp.support@caci.co.uk


National Rail Enquiries Data Feeds
Data Feeds About the Feeds Darwin Webservice (Public) Darwin Webservice (Staff) Historical Service Performance Push Port KnowledgeBaseDTDLocations (PoC)Real Time Journey Planner
LDB API About
LDB-SV API About
HSP About
DTD About Fares Timetable
Push Port About XML Schemas Schedules Associations Train Status Station Messages Alarms Train Order Train Alerts Formations Formation loading