public class ScheduleDataProcessor
extends java.lang.Object
This class also processes schedule adherence information so one can determine what the schedule adherence was with the old stop_times and what it would be with the new ones. This allows one to see directly what kind of improvement using the new stop_times will provide. The schedule adherence information is included in the log file, not in the new stop_times files.
The results are output into the original GTFS directory as two new stop times files: stop_times.txt_new and stop_times.txt_extended. Note that the original stop_times.txt file will not be overwritten. If you want to use the new stop_times.txt_new file you need to change its name to stop_times.txt, thereby overwriting the old file.
The stop_times.txt_new file has exactly the same format as the standard stop_times.txt file. The stop_times.txt_extended file contains the standard data but also adds some very useful columns including the original stop time so you can see how much it is being changed, the min and max arrival & departure times, the standard deviation so you can see the distribution of times, and the number of data points so you can see how many trips were used to generate the values.
The order of the rows in the new stop_times files will not necessarily be the same as the original stop_times file. If the ordering of the original stop_times file is adequate such that trips are grouped together and that the stop_sequence increases within the trip, then the new stop_times files can have the same order. But if there are issues with the ordering of the data in the original stop_times file, which is somewhat common, then the data is first sorted so that can determine the first stops of trips, which is important for the GTFS data is frequency based. This leads to a different ordering for the stop_times.txt_new and stop_times.txt_extended files.
To process the data this class reads in arrival and departure data from the
database. It batch reads the data 500,000 datapoints at a time, a value
chosen to make db reading quick (want a high number) without using too much
heap memory at once (want a low number). The arrivals and departures data is
read into maps
Map<String, Map<TripStopKey, List<Integer>>>
using
readInArrivalsOrDeparturesFromDb()
. The map is keyed on routeId
so that can handle each route separately (though this isn't truly needed).
The data is simply stored as Integers indicating the time of day of the
arrival or departure. Once this data is determined the ArrivalDeparture
object is not needed anymore and can be garbage collected. When reading in
departures it also puts the trip departure times into
departureTimesFromTerminalMap so that can determine elapsed time for when
frequency based trips are used.
Once all of the arrival and departure times have been processed into a map statistics is used to determine which is the best arrival/departure for the stop_times output. The goal is to use a time such that only the a desired fraction of arrivals/departures will be early. For example, if you want only 20% of the vehicle to be early with respect to the schedule time, which is reasonable because for passengers it is better for vehicles to be late rather then early so they don't miss the vehicle, then the value should be 0.2. This desiredFractionEarly value is specified when the ScheduleDataProcessor object is constructed.
The way the software tries to achieve the desiredFractionEarly is by assuming there is a Gaussian distribution of the times. By using the standard deviation of a Gaussian distribution the software estimates the value to use to such that desiredFractionEarly will be attained. Of course the distribution is not truly Gaussian. Therefore several iterations are used to adjust the value in order to get the desired results.
The results are then output into the stop_times.txt_new and stop_times.txt_extended files described above.
Modifier and Type | Class and Description |
---|---|
static class |
ScheduleDataProcessor.TerminalDeparturesKey
Special MapKey class so that can make sure using the proper key for the
departureTimesFromTerminalMap map in this class.
|
static class |
ScheduleDataProcessor.TripStopKey
Special MapKey class so that can make sure using the proper key for the
several maps in this class.
|
Constructor and Description |
---|
ScheduleDataProcessor(java.lang.String gtfsDirectoryName,
java.util.Date beginTime,
java.util.Date endTime,
Time timeForUsingCalendar,
double desiredFractionEarly,
int allowableDifferenceFromMeanSecs,
int allowableDifferenceFromOriginalTimeSecs,
boolean doNotUpdateFirstStopOfTrip,
int allowableEarlySecs,
int allowableLateSecs) |
Modifier and Type | Method and Description |
---|---|
static ScheduleDataProcessor.TripStopKey |
getTripStopKey(java.lang.String tripId,
java.lang.String stopId)
For use in the sub-maps of arrivalTimesFromDbByRouteByTripStopMap and
departureTimesFromDbByRouteByTripStopMap.
|
void |
process()
Reads original stop_times.txt file, reads in arrival/departures from the
database, processes the arrival/departure info to determine more accurate
schedule times, and writes the results to new stop_times files.
|
public ScheduleDataProcessor(java.lang.String gtfsDirectoryName, java.util.Date beginTime, java.util.Date endTime, Time timeForUsingCalendar, double desiredFractionEarly, int allowableDifferenceFromMeanSecs, int allowableDifferenceFromOriginalTimeSecs, boolean doNotUpdateFirstStopOfTrip, int allowableEarlySecs, int allowableLateSecs)
gtfsDirectoryName
- beginTime
- endTime
- timeForUsingCalendar
- desiredFractionEarly
- how many arrival/departures should be earlyallowableDifferenceFromMeanSecs
- allowableDifferenceFromOriginalTimeSecs
- doNotUpdateFirstStopOfTrip
- allowableEarlySecs
- allowableLateSecs
- public static ScheduleDataProcessor.TripStopKey getTripStopKey(java.lang.String tripId, java.lang.String stopId)
tripId
- stopId
- public void process()
For each trip/stop in the stop_times.txt file sees if there is AVL based arrival/departure times. If there is then it is used when creating GtfsExtendedStopTime object. If no data for the trip/stop then null values will be used. The result GtfsExtendedStopTimes are then written to two files: - stop_times.txt_new which uses the standard GTFS format - stop_times.txt_extended which has additional info such as standard deviation.