Talk:BPLAN Geography Data: Difference between revisions
ThomasWood (talk | contribs) some BPLAN files uploaded here are duplicated, or have PIT record consistency errors. flag these up here for attention |
ThomasWood (talk | contribs) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
! Main File !! Duplicate File !! Notes | ! Main File !! Duplicate File !! Notes | ||
|- | |- | ||
| [[File:Geography_20141214_to_20150516_from_20150302.txt.gz]] || | | [[File:Geography_20141214_to_20150516_from_20150302.txt.gz]] || File:20150302 ReferenceData.gz || Zipped contents identical | ||
|- | |- | ||
| [[File:Geography_20151213_to_20160514_from_20160126.txt.gz]] || | | [[File:Geography_20151213_to_20160514_from_20160126.txt.gz]] || File:20160126 ReferenceData.gz || Zipped contents identical | ||
|- | |- | ||
| [[File:Geography_20161211_to_20171209_from_20170124.txt.gz]] || | | [[File:Geography_20161211_to_20171209_from_20170124.txt.gz]] || File:20170127 ReferenceData.gz || Zipped contents identical | ||
|- | |- | ||
| [[File:Geography_20161211_to_20171209_from_20170823.txt.gz]] || [[File:20170830 ReferenceData.gz]] || Zipped contents '''differ''', main file does not have 1155 REF+SER records and 9 TLD records (main file also has PIT footer inconsistent with TLD record count) | | [[File:Geography_20161211_to_20171209_from_20170823.txt.gz]] || [[File:20170830 ReferenceData.gz]] || Zipped contents '''differ''', main file does not have 1155 REF+SER records and 9 TLD records (main file also has PIT footer inconsistent with TLD record count). PIF headers for the two files are identical. | ||
|} | |} | ||
--[[User:ThomasWood|ThomasWood]] ([[User talk:ThomasWood|talk]]) 04:00, 22 February 2019 (UTC) | --[[User:ThomasWood|ThomasWood]] ([[User talk:ThomasWood|talk]]) 04:00, 22 February 2019 (UTC) | ||
I'll delete the first three orphaned files and take a look at the last one. | |||
--[[User:PeterHicks|PeterHicks]] ([[User talk:PeterHicks|talk]]) 14:57, 23 February 2019 (UTC) | |||
= Files with PIT record consistency errors = | = Files with PIT record consistency errors = | ||
Line 23: | Line 27: | ||
|- | |- | ||
| rowspan="2" | [[File:20140116_ReferenceData.gz]] | | rowspan="2" | [[File:20140116_ReferenceData.gz]] | ||
| PIT || 1405 || 628 || 10468 || 3487 || 37093 || 1047866 | | PIT || '''1405''' || 628 || 10468 || 3487 || 37093 || 1047866 | ||
|- | |- | ||
| Actual || 247 || 628 || 10468 || 3487 || 37093 || 1047866 | | Actual || '''247''' || 628 || 10468 || 3487 || 37093 || 1047866 | ||
|- | |- | ||
| rowspan="2" | [[File:Geography_20151213_to_20160514_from_20160126.txt.gz]] | | rowspan="2" | [[File:Geography_20151213_to_20160514_from_20160126.txt.gz]] | ||
| PIT || 1422 || 610 || 10874 || 3663 || 39110 || 1071772 | | PIT || '''1422''' || 610 || '''10874''' || 3663 || '''39110''' || '''1071772''' | ||
|- | |- | ||
| Actual || 257 || 610 || 10873 || 3663 || 39108 || 1071769 | | Actual || '''257''' || 610 || '''10873''' || 3663 || '''39108''' || '''1071769''' | ||
|- | |- | ||
| rowspan="2" | [[File:Geography_20161211_to_20171209_from_20170823.txt.gz]] | | rowspan="2" | [[File:Geography_20161211_to_20171209_from_20170823.txt.gz]] | ||
| PIT || 262 || 638 || 11062 || 3948 || 40119 || 1101022 | | PIT || 262 || '''638''' || 11062 || 3948 || 40119 || 1101022 | ||
|- | |- | ||
| Actual || 262 || 629 || 11062 || 3948 || 40119 || 1101022 | | Actual || 262 || '''629''' || 11062 || 3948 || 40119 || 1101022 | ||
|} | |} | ||
--[[User:ThomasWood|ThomasWood]] ([[User talk:ThomasWood|talk]]) 04:00, 22 February 2019 (UTC) | --[[User:ThomasWood|ThomasWood]] ([[User talk:ThomasWood|talk]]) 04:00, 22 February 2019 (UTC) | ||
: [[File:Geography_20181209_to_20190518_from_20180618.txt.gz]] also has an inconsistent PIT record, it reports that the file should contain 650 TLD records, but only 641 are present in the file. --[[User:ThomasWood|ThomasWood]] ([[User talk:ThomasWood|talk]]) 15:47, 28 May 2019 (UTC) |
Latest revision as of 15:47, 28 May 2019
Files Duplicated
There are a number of duplicate BPLAN files on the wiki, all of which are orphaned, should the duplicated ones be deleted?
Main File | Duplicate File | Notes |
---|---|---|
File:Geography 20141214 to 20150516 from 20150302.txt.gz | File:20150302 ReferenceData.gz | Zipped contents identical |
File:Geography 20151213 to 20160514 from 20160126.txt.gz | File:20160126 ReferenceData.gz | Zipped contents identical |
File:Geography 20161211 to 20171209 from 20170124.txt.gz | File:20170127 ReferenceData.gz | Zipped contents identical |
File:Geography 20161211 to 20171209 from 20170823.txt.gz | File:20170830 ReferenceData.gz | Zipped contents differ, main file does not have 1155 REF+SER records and 9 TLD records (main file also has PIT footer inconsistent with TLD record count). PIF headers for the two files are identical. |
--ThomasWood (talk) 04:00, 22 February 2019 (UTC)
I'll delete the first three orphaned files and take a look at the last one.
--PeterHicks (talk) 14:57, 23 February 2019 (UTC)
Files with PIT record consistency errors
A number of the (non-duplicate) BPLAN data files have PIT record values inconsistent with the contained number of records. Flagging this up here in case a data processing tool upstream has some undetected bugs that is causing these records to be unintentionally dropped. In particular, the 9 missing TLD entries from the File:Geography 20161211 to 20171209 from 20170823.txt.gz file appear to have a character that is only valid in the Windows-1252 encoding, so may cause some tools to reject these entries. (Differences between the 9 missing rows determined via the two different versions of this file published on the wiki, as noted in the previous section).
File | Count Type | REF | TLD | LOC | PLT | NWK | TLK |
---|---|---|---|---|---|---|---|
File:20140116 ReferenceData.gz | PIT | 1405 | 628 | 10468 | 3487 | 37093 | 1047866 |
Actual | 247 | 628 | 10468 | 3487 | 37093 | 1047866 | |
File:Geography 20151213 to 20160514 from 20160126.txt.gz | PIT | 1422 | 610 | 10874 | 3663 | 39110 | 1071772 |
Actual | 257 | 610 | 10873 | 3663 | 39108 | 1071769 | |
File:Geography 20161211 to 20171209 from 20170823.txt.gz | PIT | 262 | 638 | 11062 | 3948 | 40119 | 1101022 |
Actual | 262 | 629 | 11062 | 3948 | 40119 | 1101022 |
--ThomasWood (talk) 04:00, 22 February 2019 (UTC)
- File:Geography 20181209 to 20190518 from 20180618.txt.gz also has an inconsistent PIT record, it reports that the file should contain 650 TLD records, but only 641 are present in the file. --ThomasWood (talk) 15:47, 28 May 2019 (UTC)