Talk:BPLAN Geography Data
Files Duplicated
There are a number of duplicate BPLAN files on the wiki, all of which are orphaned, should the duplicated ones be deleted?
Main File | Duplicate File | Notes |
---|---|---|
File:Geography 20141214 to 20150516 from 20150302.txt.gz | File:20150302 ReferenceData.gz | Zipped contents identical |
File:Geography 20151213 to 20160514 from 20160126.txt.gz | File:20160126 ReferenceData.gz | Zipped contents identical |
File:Geography 20161211 to 20171209 from 20170124.txt.gz | File:20170127 ReferenceData.gz | Zipped contents identical |
File:Geography 20161211 to 20171209 from 20170823.txt.gz | File:20170830 ReferenceData.gz | Zipped contents differ, main file does not have 1155 REF+SER records and 9 TLD records (main file also has PIT footer inconsistent with TLD record count). PIF headers for the two files are identical. |
--ThomasWood (talk) 04:00, 22 February 2019 (UTC)
I'll delete the first three orphaned files and take a look at the last one.
--PeterHicks (talk) 14:57, 23 February 2019 (UTC)
Files with PIT record consistency errors
A number of the (non-duplicate) BPLAN data files have PIT record values inconsistent with the contained number of records. Flagging this up here in case a data processing tool upstream has some undetected bugs that is causing these records to be unintentionally dropped. In particular, the 9 missing TLD entries from the File:Geography 20161211 to 20171209 from 20170823.txt.gz file appear to have a character that is only valid in the Windows-1252 encoding, so may cause some tools to reject these entries. (Differences between the 9 missing rows determined via the two different versions of this file published on the wiki, as noted in the previous section).
File | Count Type | REF | TLD | LOC | PLT | NWK | TLK |
---|---|---|---|---|---|---|---|
File:20140116 ReferenceData.gz | PIT | 1405 | 628 | 10468 | 3487 | 37093 | 1047866 |
Actual | 247 | 628 | 10468 | 3487 | 37093 | 1047866 | |
File:Geography 20151213 to 20160514 from 20160126.txt.gz | PIT | 1422 | 610 | 10874 | 3663 | 39110 | 1071772 |
Actual | 257 | 610 | 10873 | 3663 | 39108 | 1071769 | |
File:Geography 20161211 to 20171209 from 20170823.txt.gz | PIT | 262 | 638 | 11062 | 3948 | 40119 | 1101022 |
Actual | 262 | 629 | 11062 | 3948 | 40119 | 1101022 |
--ThomasWood (talk) 04:00, 22 February 2019 (UTC)
- File:Geography 20181209 to 20190518 from 20180618.txt.gz also has an inconsistent PIT record, it reports that the file should contain 650 TLD records, but only 641 are present in the file. --ThomasWood (talk) 15:47, 28 May 2019 (UTC)