ScheduleDataProcessor

java.lang.Object
- org.transitime.statistics.ScheduleDataProcessor

```
public class ScheduleDataProcessor
extends java.lang.Object
```
For processing arrival/departure times based on AVL data in order to determine more accurate schedule information. The results are output into new stop_times GTFS files. Data is handled on a per trip basis. This means that for regular schedule based systems that run a trip only once per day you need several days of data to get multiple data points. But for frequency based configurations where a trip is run multiple times a day each run of that trip for the day is processed into a single value.
This class also processes schedule adherence information so one can determine what the schedule adherence was with the old stop_times and what it would be with the new ones. This allows one to see directly what kind of improvement using the new stop_times will provide. The schedule adherence information is included in the log file, not in the new stop_times files.
The results are output into the original GTFS directory as two new stop times files: stop_times.txt_new and stop_times.txt_extended. Note that the original stop_times.txt file will not be overwritten. If you want to use the new stop_times.txt_new file you need to change its name to stop_times.txt, thereby overwriting the old file.
The stop_times.txt_new file has exactly the same format as the standard stop_times.txt file. The stop_times.txt_extended file contains the standard data but also adds some very useful columns including the original stop time so you can see how much it is being changed, the min and max arrival & departure times, the standard deviation so you can see the distribution of times, and the number of data points so you can see how many trips were used to generate the values.
The order of the rows in the new stop_times files will not necessarily be the same as the original stop_times file. If the ordering of the original stop_times file is adequate such that trips are grouped together and that the stop_sequence increases within the trip, then the new stop_times files can have the same order. But if there are issues with the ordering of the data in the original stop_times file, which is somewhat common, then the data is first sorted so that can determine the first stops of trips, which is important for the GTFS data is frequency based. This leads to a different ordering for the stop_times.txt_new and stop_times.txt_extended files.
To process the data this class reads in arrival and departure data from the database. It batch reads the data 500,000 datapoints at a time, a value chosen to make db reading quick (want a high number) without using too much heap memory at once (want a low number). The arrivals and departures data is read into maps Map<String, Map<TripStopKey, List<Integer>>> using readInArrivalsOrDeparturesFromDb(). The map is keyed on routeId so that can handle each route separately (though this isn't truly needed). The data is simply stored as Integers indicating the time of day of the arrival or departure. Once this data is determined the ArrivalDeparture object is not needed anymore and can be garbage collected. When reading in departures it also puts the trip departure times into departureTimesFromTerminalMap so that can determine elapsed time for when frequency based trips are used.
Once all of the arrival and departure times have been processed into a map statistics is used to determine which is the best arrival/departure for the stop_times output. The goal is to use a time such that only the a desired fraction of arrivals/departures will be early. For example, if you want only 20% of the vehicle to be early with respect to the schedule time, which is reasonable because for passengers it is better for vehicles to be late rather then early so they don't miss the vehicle, then the value should be 0.2. This desiredFractionEarly value is specified when the ScheduleDataProcessor object is constructed.
The way the software tries to achieve the desiredFractionEarly is by assuming there is a Gaussian distribution of the times. By using the standard deviation of a Gaussian distribution the software estimates the value to use to such that desiredFractionEarly will be attained. Of course the distribution is not truly Gaussian. Therefore several iterations are used to adjust the value in order to get the desired results.
The results are then output into the stop_times.txt_new and stop_times.txt_extended files described above.

Author:

SkiBu Smith

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`ScheduleDataProcessor.TerminalDeparturesKey` Special MapKey class so that can make sure using the proper key for the departureTimesFromTerminalMap map in this class.
`static class`	`ScheduleDataProcessor.TripStopKey` Special MapKey class so that can make sure using the proper key for the several maps in this class.

Constructor Summary

Constructors
Constructor and Description
`ScheduleDataProcessor(java.lang.String gtfsDirectoryName, java.util.Date beginTime, java.util.Date endTime, Time timeForUsingCalendar, double desiredFractionEarly, int allowableDifferenceFromMeanSecs, int allowableDifferenceFromOriginalTimeSecs, boolean doNotUpdateFirstStopOfTrip, int allowableEarlySecs, int allowableLateSecs)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static ScheduleDataProcessor.TripStopKey`	`getTripStopKey(java.lang.String tripId, java.lang.String stopId)` For use in the sub-maps of arrivalTimesFromDbByRouteByTripStopMap and departureTimesFromDbByRouteByTripStopMap.
`void`	`process()` Reads original stop_times.txt file, reads in arrival/departures from the database, processes the arrival/departure info to determine more accurate schedule times, and writes the results to new stop_times files.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ScheduleDataProcessor
```
public ScheduleDataProcessor(java.lang.String gtfsDirectoryName,
                             java.util.Date beginTime,
                             java.util.Date endTime,
                             Time timeForUsingCalendar,
                             double desiredFractionEarly,
                             int allowableDifferenceFromMeanSecs,
                             int allowableDifferenceFromOriginalTimeSecs,
                             boolean doNotUpdateFirstStopOfTrip,
                             int allowableEarlySecs,
                             int allowableLateSecs)
```
    Parameters:
    
    gtfsDirectoryName -
    
    beginTime -
    
    endTime -
    
    timeForUsingCalendar -
    
    desiredFractionEarly - how many arrival/departures should be early
    
    allowableDifferenceFromMeanSecs -
    
    allowableDifferenceFromOriginalTimeSecs -
    
    doNotUpdateFirstStopOfTrip -
    
    allowableEarlySecs -
    
    allowableLateSecs -
- Method Detail
  - getTripStopKey
```
public static ScheduleDataProcessor.TripStopKey getTripStopKey(java.lang.String tripId,
                                                               java.lang.String stopId)
```
    For use in the sub-maps of arrivalTimesFromDbByRouteByTripStopMap and departureTimesFromDbByRouteByTripStopMap. The key is simply tripId + stopId. Previously used tripId + "=" + stopId but trying to make things as efficient as possible. This might not actually be a good idea since it could make debugging a bit more difficult.
    
    Parameters:
    
    tripId -
    
    stopId -
    
    Returns:
  - process
```
public void process()
```
    Reads original stop_times.txt file, reads in arrival/departures from the database, processes the arrival/departure info to determine more accurate schedule times, and writes the results to new stop_times files.
    For each trip/stop in the stop_times.txt file sees if there is AVL based arrival/departure times. If there is then it is used when creating GtfsExtendedStopTime object. If no data for the trip/stop then null values will be used. The result GtfsExtendedStopTimes are then written to two files: - stop_times.txt_new which uses the standard GTFS format - stop_times.txt_extended which has additional info such as standard deviation.

Class ScheduleDataProcessor

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ScheduleDataProcessor

Method Detail

getTripStopKey

process