This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
mission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2015-01-09 09:42] – [The python solution] chrono | mission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2016-08-09 19:13] (current) – Updated VFCC links chrono | ||
---|---|---|---|
Line 22: | Line 22: | ||
===== In the beginning there was the copy ===== | ===== In the beginning there was the copy ===== | ||
- | Even if it appears unique and original to us, there always was some other inspiration/ | + | Even if it appears unique and original to us, there always was some other inspiration/ |
===== The Problem ===== | ===== The Problem ===== | ||
Line 30: | Line 30: | ||
Unfortunately the data isn't accessible through an API or at least some JSON export of the raw data. Which meant I needed to devise a robot that would periodically scrape the data from that web page, extract all needed values and feed that data into the UCSSPM to calculate with real data for reference. Once it has done all that it has to push all usable raw data and the results of the UCSSPM prediction into an influxdb shard running on the stargazer so that the data can be stored, queried and (re)viewed live on the following VFCC dashboards: | Unfortunately the data isn't accessible through an API or at least some JSON export of the raw data. Which meant I needed to devise a robot that would periodically scrape the data from that web page, extract all needed values and feed that data into the UCSSPM to calculate with real data for reference. Once it has done all that it has to push all usable raw data and the results of the UCSSPM prediction into an influxdb shard running on the stargazer so that the data can be stored, queried and (re)viewed live on the following VFCC dashboards: | ||
- | * [[https:// | + | * [[https:// |
- | * [[https:// | + | * [[https:// |
- | * [[https:// | + | * [[https:// |
===== The bash solution ===== | ===== The bash solution ===== | ||
Line 97: | Line 97: | ||
Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, | Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, | ||
- | And finally, the data structure and shipping method to influxdb is more than questionable, | + | And finally, the data structure and shipping method to influxdb is more than questionable, |
===== The python solution ===== | ===== The python solution ===== |