MENU

Data

MENU

Data

How can publicly available data sources help us better understand the world we live in?

How can publicly available data sources help us better understand the world we live in?

Data never sleeps. In 2021, the world’s five billion plus internet users consumed an estimated 79 zettabytes worth of data, from Amazon to Zoom.* As new data sources become more widely accessible to human and machine analytics, the main challenge is not so much data management as it is data discernment. That is:

How do we make more judicious choices about our data production, curation, and consumption that are consistent with our democratic values?

You’ve certainly heard it before, but let’s just reiterate for good measure: labeled datasets are the backbone of most machine learning systems. However, the lack of high-quality, permissively licensed training data continues to be a barrier to entry for DIYers interested in building and deploying smart applications and systems.

Since its inception, IQT Labs has been deeply involved in improving Open Source access to high-quality data. We began arduously building and releasing hand-curated datasets, as well as unearthing the utility of synthetic data production. At the moment we are slightly obsessed with the trends towards automatic curation of omniscient labelled data and the transformative role these methods may play in the future.

While data creation remains a focus, so are the tools and methods growing in the Open Source that help us better understand the world we live in. There is so much to explore, just in the data that surrounds us.

* That’s 79*1021 or 7,900,000,000,000,000,000,000 bytes, expected to more than double to 180 zettabytes by 2025.

Related Content

Deep Geolocation with Satellite Imagery of Ukraine

  • Blog, Data
Quick Look
  • Blog, Data

Deep Geolocation with Satellite Imagery of Ukraine

As the Russian invasion of Ukraine plays out before the world on social media, it’s critical to verify the places where the photos/videos were recorded. This blog explores our attempt to geolocate these social media photos by using satellite imagery and the power of deep learning.
LEARN MORE

Fusing Without Confusing

  • Blog, Data
Quick Look
  • Blog, Data

Fusing Without Confusing

Humans use all available senses to identify the world around them, a concept often called "sensor fusion" in Machine Learning. This blog focuses on how video and audio sensors can help compensate for each other's weaknesses when data used for identification is impaired.
LEARN MORE

Fusing Without Confusing – A Unique Data Processing Pipeline for VoxCeleb2

  • Blog, Data
Quick Look
  • Blog, Data

Fusing Without Confusing – A Unique Data Processing Pipeline for VoxCeleb2

We developed a DataLoader that can feed audio and video into a single machine learning model without additional processing - the first of its kind! Fusing Without Confusing uses multi-modal deep learning to fuse audio and video - while ignoring background noise that can confuse learning Machine Learning models.
LEARN MORE
  • BLOG

The Unforgiving Asymptote of Chasing Machine Learning Gains

Verifying the geographic location of outdoor images

  • Data
Quick Look
  • Data

The Unforgiving Asymptote of Chasing Machine Learning Gains

Verifying the geographic location of outdoor images

Some of the many challenges that users of machine learning run into are hidden feedback loops, undeclared consumers, data dependencies, and configuration issues. Read more to learn about two case studies that can help improve and optimize machine learning.
LEARN MORE
  • Report

Where in the World Report

  • Data, Investigate
Quick Look
  • Data, Investigate

Where in the World Report

Verifying the geographic location of outdoor images.
LEARN MORE
  • Report

Synthesizing Robustness Report

  • Data, Investigate
Quick Look
  • Data, Investigate

Synthesizing Robustness Report

Is your synthetic data letting you down? This report explores whether generative models can increase the utility of synthetic data for rare object detection.
LEARN MORE
  • Project

Where in the World

Verifying the geographic location of outdoor images

  • Data
Quick Look
  • Data

Where in the World

Verifying the geographic location of outdoor images

Suppose that a photograph has surfaced under dubious circumstances, raising the question of where it was really taken. One potential solution, cross-view image geolocalization (CVIG), is the process of geolocating an outdoor photograph by comparing it to satellite imagery of possible locations. This project touched on each major component of CVIG deep learning.
LEARN MORE
  • Project

Synthesizing Robustness

Improving synthetic data with generative deep learning networks

  • Data
Quick Look
  • Data

Synthesizing Robustness

Improving synthetic data with generative deep learning networks

It is generally thought that a good AI model needs a lot of good data. But what about when it is unfeasible or unreasonable to collect such a large dataset? How best to leverage small datasets for machine learning tasks is an active area of exploration. Synthetic data has the potential to alleviate object rarity and long-tail distributions, provided the synthetic data introduces more signal than noise into the system.
LEARN MORE
  • BLOG

Introducing Atlas: A Tool for Visualizing Deep Tech Relationships at Scale

  • Blog, Data
Quick Look
  • Blog, Data

Introducing Atlas: A Tool for Visualizing Deep Tech Relationships at Scale

Map out your startup ecosystem. Analyze your citation and investing networks. A graphic of your connected nodes via Atlas can map your anything.
LEARN MORE
  • Project

SkyScan

Automatic collection of labeled datasets

  • Data, Edge
Quick Look
  • Data, Edge

SkyScan

Automatic collection of labeled datasets

Quickly and efficiently building and labeling image datasets for machine learning applications can be a prohibitively time-consuming and expensive activity. SkyScan demonstrates a low-cost system that can capture images of aircraft in flight and automatically label the image with captured metadata.
LEARN MORE
  • BLOG

Measuring Global Bio-Power: The Strengths and Limits of Using Open Source Software Metadata to Analyze Worldwide Technical Trends

  • Blog, Data
Quick Look
  • Blog, Data

Measuring Global Bio-Power: The Strengths and Limits of Using Open Source Software Metadata to Analyze Worldwide Technical Trends

Interested in global technology trends but frustrated by the shortcomings of patent and citation data? Learn how we analyzed global bioinformatics trends using opensource software metadata, and see what tactics you can borrow.
LEARN MORE
  • BLOG

Synthesizing Robustness YOLTv4 Results Part 2: Dataset Size Requirements and Geographic Insights

  • Blog, Data
Quick Look
  • Blog, Data

Synthesizing Robustness YOLTv4 Results Part 2: Dataset Size Requirements and Geographic Insights

In previous blogs we discussed the dataset and initial aggregate results for our “Synthesizing Robustness” project. Read further to learn about detailed results for object detection models, focusing on geographic disparities and individual object classes.
LEARN MORE
  • BLOG

Synthesizing Robustness YOLTv4 Results Part 1: The Nuances of Extracting Utility from Synthetic Data

  • Blog, Data
Quick Look
  • Blog, Data

Synthesizing Robustness YOLTv4 Results Part 1: The Nuances of Extracting Utility from Synthetic Data

Is your synthetic data letting you down? Our new blog explores our Synthesizing Robustness project and whether domain adaptation strategies can improve the efficacy of the synthetic data when it comes to localizing rare objects.
LEARN MORE
  • BLOG

The Geography of Open Source Data Science: Mapping Anaconda Code Contributors

  • Blog, Data
Quick Look
  • Blog, Data

The Geography of Open Source Data Science: Mapping Anaconda Code Contributors

Curious about the global production patterns of open source Python data science packages? Our new blog details how they created a dataset of 25,000 contributors and then analyzed contributor and software metadata to understand global data science patterns.
LEARN MORE
  • BLOG

Synthesizing Robustness

  • Blog, Data
Quick Look
  • Blog, Data

Synthesizing Robustness

Is your synthetic data letting you down? For the Synthesizing Robustness project, we explored if generative models can increase the utility of synthetic data for rare object detection.
LEARN MORE
  • Git Repo

Where in the World GitHub Repo

  • Build, Data
Quick Look
  • Build, Data

Where in the World GitHub Repo

Tools and capabilities for cross-view image geolocalization.
LEARN MORE
  • Git Repo

ViziFlu GitHub Repo

  • Build, Data
Quick Look
  • Build, Data

ViziFlu GitHub Repo

An open-source visualization tool to help make multi-model seasonal influenza forecasts more actionable for decision-makers.
LEARN MORE
  • Git Repo

Atlas GitHub Repo

  • Build, Data
Quick Look
  • Build, Data

Atlas GitHub Repo

Atlas is a visualization tool for creating customizable, interactive graphs in a browser using Python and Dash Cytoscape.
LEARN MORE
  • Open Source Dataset

RarePlanes Dataset

  • Data
Quick Look
  • Data

RarePlanes Dataset

A unique Open Source, hand labelled, dataset that incorporates both real and synthetically generated satellite imagery of aircraft.
LEARN MORE
  • Open Source Dataset

VOiCES Dataset

  • Data
Quick Look
  • Data

VOiCES Dataset

Hand labelled speech data in acoustically challenging reverberant environments.
LEARN MORE
  • Open Source Dataset

SpaceNet Dataset

  • Data
Quick Look
  • Data

SpaceNet Dataset

The largest hand labelled open dataset of commercial satellite imagery. IQT Labs was a Co-founder & Managing Partner of SpaceNet 2016 - 2021.
LEARN MORE
  • Tool for Data Storytelling

ViziFlu

  • Data
Quick Look
  • Data

ViziFlu

A visualization tool that can display multiple influenza models and allow users to compare the uncertainty across those models
LEARN MORE
  • Tool for Data Storytelling

dataviz.cafe

  • Data
Quick Look
  • Data

dataviz.cafe

IQT’s searchable catalog of Open Source Data Visualization Software
LEARN MORE
About
Explore
Projects
Contact

Copyright © 2022 · IQT Labs LLC.  All Rights Reserved.

Twitter Github

Terms of Use | Privacy Policy

CLOSE
ABOUT
EXPLORE
EDGE
DATA
TRUST
PROJECTS
CONTACT

We use cookies to offer you a better experience. By continuing to use this website, you consent to the use of cookies in accordance with our Privacy Policy. Read More

Accept