Continuing on what I have been blogging about then this is the start of flourishing the IOT data pipeline that is created in the Quicklabs tutorial.
What data did the original do:
The original file (<– link just there), makes up some data using the Python random module. This generates a number of readings for a sensor based on the arguments that you have passed in!
Why did I want to change it:
I felt the dataset it produced was a little small and not really real. I wanted locations so that you could do cooler visualisations on it, and as a more real world example.
How did I change it:
The code can be found here: Position Tracker
TL;DR:
I created an additional script and renamed the original; that read a file and for three additional parameters generated additional devices and data and sent them off to the Google IOT device registry, subsequently Google PubSub, Google Dataflow and finally Google Bigquery to be visualised (cruedly) in Google Datastudio.

Issues: VERY slow to generate the data; use dictionaries better or an other library such as Pandas?
Otherwise:
First thing I did was create a new module called: generate_data
. This was going to hold the stub and any associated files and data that got produced. I also cp cloudiot_mqtt_example_json.py create-send-data.py
so that I could have free reign to change what I need to!
Next I got ambitious and thought where do I want to generate locations for and I came up with the UK. Looking through a Google search then people suggested many ways to do this in Python. They all had floors though, so I decided on using actual postcode locations; where to source that from.
I found a site: freemaptools.com and it had UK postcode data to download; it also seems to be refreshed frequently! So I got this file and inspected it; good data of the format: id,postcode,latitude,longitude
I created a time monster:
I created in the new generate_data
a generate_data_stub.py
taking parameters for:
- The number of devices you wanted to generate data for
- The number of datapoints per device
- The filename of the output data
This in turn:
- Read the UK postcodes file and turned it into a file that was processed into python dictionary lines.
- Read the new file and for each first letter of the postcode created a dictionary of the min and max id (for randomness later)
- Created a list of random device numbers
- For each of the devices then:
- Chose a random letter
- Got the min and max of the id’s from the 2) step dictionary
- Chooses a random set of numbers in the range from 6)
- Finds the id and associated data from the processed file
- Flourishes the data with a temp and the mobile no.
- Writes it to the output file.
Issue:
This works, but it it is very time consuming and seems to be CPU bound.
What to try next:
I think there is very little need to keep writing all the data around and could be done more in memory better utilising dictionaries or using a library such as Pandas.