• About Dangerous DBA
  • Table of Contents
Dangerous DBA A blog for those DBA's who live on the edge

Position Tracker – The Stub

May 15, 2020 11:05 am / Leave a Comment / dangerousDBA

Continuing on what I have been blogging about then this is the start of flourishing the IOT data pipeline that is created in the Quicklabs tutorial.

What data did the original do:

The original file (<– link just there), makes up some data using the Python random module. This generates a number of readings for a sensor based on the arguments that you have passed in!

Why did I want to change it:

I felt the dataset it produced was a little small and not really real. I wanted locations so that you could do cooler visualisations on it, and as a more real world example.

How did I change it:

The code can be found here: Position Tracker

TL;DR:

I created an additional script and renamed the original; that read a file and for three additional parameters generated additional devices and data and sent them off to the Google IOT device registry, subsequently Google PubSub, Google Dataflow and finally Google Bigquery to be visualised (cruedly) in Google Datastudio.

Screenshot of Google Datastudio of my initial data

Issues: VERY slow to generate the data; use dictionaries better or an other library such as Pandas?

Otherwise:

First thing I did was create a new module called: generate_data. This was going to hold the stub and any associated files and data that got produced. I also cp cloudiot_mqtt_example_json.py create-send-data.py so that I could have free reign to change what I need to!

Next I got ambitious and thought where do I want to generate locations for and I came up with the UK. Looking through a Google search then people suggested many ways to do this in Python. They all had floors though, so I decided on using actual postcode locations; where to source that from.

I found a site: freemaptools.com and it had UK postcode data to download; it also seems to be refreshed frequently! So I got this file and inspected it; good data of the format: id,postcode,latitude,longitude

I created a time monster:

I created in the new generate_data a generate_data_stub.py taking parameters for:

  1. The number of devices you wanted to generate data for
  2. The number of datapoints per device
  3. The filename of the output data

This in turn:

  1. Read the UK postcodes file and turned it into a file that was processed into python dictionary lines.
  2. Read the new file and for each first letter of the postcode created a dictionary of the min and max id (for randomness later)
  3. Created a list of random device numbers
  4. For each of the devices then:
  5. Chose a random letter
  6. Got the min and max of the id’s from the 2) step dictionary
  7. Chooses a random set of numbers in the range from 6)
  8. Finds the id and associated data from the processed file
  9. Flourishes the data with a temp and the mobile no.
  10. Writes it to the output file.

Issue:

This works, but it it is very time consuming and seems to be CPU bound.

What to try next:

I think there is very little need to keep writing all the data around and could be done more in memory better utilising dictionaries or using a library such as Pandas.

Posted in: 2020, Big Data, BigQuery, GCP, Google, Position Tracker, Python / Tagged: bigdata, Google, IOT, python

Leave a Reply Cancel reply

Post Navigation

← Previous Post
Next Post →

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 757 other subscribers

Recent Posts

  • Self generating Simple SQL procedures – MySQL
  • Google Cloud Management – My Idea – My White Whale?
  • Position Tracker – The Stub – Pandas:
  • Position Tracker – The Stub
  • Position Tracker – In the beginning
  • Whats been going on in the world of the Dangerous DBA:
  • QCon London Day 1
  • Testing Amazon Redshift: Distribution keys and styles
  • Back to dangerous blogging
  • DB2 10.1 LUW Certification 611 notes 1 : Physical Design

Dangerous Topics

added functionality ADMIN_EST_INLINE_LENGTH Bootcamp colum convert data types DB2 db2 DB2 Administration DB2 Development db2advis db2licm Decompose XML EXPORT GCP Google IBM IBM DB2 LUW idug information centre infosphere IOT LOAD merry christmas and a happy new year Position Tracking python Recursive Query Recursive SQL Reorganisation Reorganise Reorganise Indexes Reorganise Tables Runstats sql statement Stored Procedures SYSPROC.ADMIN_CMD Time UDF User Defined Functions V9.7 V10.1 Varchar XML XML PATH XMLTABLE

DangerousDBA Links

  • DB2 for WebSphere Commerce
  • My Personal Blog

Disclaimer:

The posts here represent my personal views and not those of my employer. Any technical advice or instructions are based on my own personal knowledge and experience, and should only be followed by an expert after a careful analysis. Please test any actions before performing them in a critical or nonrecoverable environment. Any actions taken based on my experiences should be done with extreme caution. I am not responsible for any adverse results. DB2 is a trademark of IBM. I am not an employee or representative of IBM.

Advertising

© Copyright 2023 - Dangerous DBA
Infinity Theme by DesignCoral / WordPress