Hi, I'm Andrew Ganse. I'm an applied physicist and data scientist, lately leading and building out machine learning and data analysis systems at Seattle area startup businesses. Whether in the commercial world or in the academic science world, the focus of my work has been in machine learning, inverse problems, optimization, signal processing, and data analysis. (My PhD as well as much of my work at APL-UW concerned inverse problems in geophysics). Here on research.ganse.org are some of my publicly shareable research results and toy problems and tools, both in data science topics and in applied physics topics. You can contact me at andrew@ganse.org.
FLOW_MODELS: IMAGE GENERATION AND ANOMALY DETECTION AS TWO SIDES OF SAME COIN.
Normalizing flow models are invertible neural networks, a type of generative model that offers a nice two-for-one benefit: simultaneously enabling unsupervised learning for image anomaly detection (by mapping unlabeled images to a distribution where statistical anomaly detection techniques can apply) and also enabling image simulation (by mapping randomly generated samples from a probability distribution into the image space). Now it may not be my ultimate use case, but it turns out there are, you know, a ton of cat images and datasets on the internet, so let's experiment with INNs and cats!
GETTING MLFLOW+DATABASE RUNNING QUICKLY VIA DOCKER
This provides a get-running-quickly Docker-compose setup using containers for MLflow, PostgreSQL, and NGINX. Run MLflow's database in PostgreSQL, and put an NGINX reverse proxy in front of the MLflow website to allow some level of access restriction (say for a workgroup within an already-firewalled company intranet).
DBSCAN CLUSTERING IN DECRYPTING AN IMAGE CYPHER
This wonderful kids' book series is fun not only for the stories themselves, but also because each of the first several books involves a cipher puzzle with "fairy hieroglyphics" - I love code puzzles! In the electronic form of the books I discovered the hieroglyphic sequence was moved to the back of the book, all perfectly lined up in matrices over a few pages at the end. And I thought, hey that seems like it'd be easy to parse and decrypt on a computer, just like the main character did!
ELECTROMAGNETIC INVERSION OF ESTUARINE SALINITY STRUCTURE USING SMALL-SCALE CSEM
The Conductivity Profiler is an instrument for remotely observing estuarine salinity profiles via electromagnetic measurements. Electromagnetic (EM) waves are attenuated in seawater as a function of frequency, and conductivity structure (closely related to salinity structure) in the water can be inferred by combining measurements of EM waves at different frequencies on a distant electric field receiver. Geophysical inversion methods are applied to estimate the estuarine salinity profile from the EM measurements. Using inverse theory techniques, we take advantage of statistical rigor and let the data determine the structure of the conductivity profile and quantify the uncertainty and resolution of the salinity profile.
MEDICAL IMAGE CLASSIFICATION BUILT WITH "MLFLOW PROJECTS"
Let's explore a supervised learning problem in medical imaging based on a public dataset and MLFlow's "Projects" functionality. A self-contained modeling module is trained, has its performance logged in MLFlow, and is able to be checked out as a deployable model image. There's a configurable implementation of this in my aganse/py_tf2_gpu_dock_mlflow repo. Let's try the malaria detection dataset from the Tensorflow datasets, which contains a balanced, labeled dataset of about 27,000 thin blood smear slide images of cells, and let's see how well we can detect malaria parasite presence in the images. This dataset is used to train/test different variations of image classification models, including VGG-16 and various sizes of more basic convolutional networks.
GPT_CLIENT CLI WITH PARAMETER CONTROL, WEBLINK SUBMISSION, & SYNTAX HIGHLIGHTING
I have found OpenAI's GPT models to be fabulously productive tools and use them often in my technical work now. But to get what I want out of the models for my uses has taken accessing the models from the API rather than the ChatGPT website GUI. This allows me to change some of the model parameters, format the output as I wish, and run the whole thing in my terminal. Of course the process of making the app has provided highly useful education in understanding how the models work as well, including how interacting with them via API can enable no end of use cases from other automated code.
PREDICTING BANK LOAN BEHAVIOR WITH RANDOM FOREST MODELS
Let's implement a random forest classifier from Scikit-Learn to see how well we can predict whether a bank client will have good loan behavior (meaning they won't default or become delinquent) if they are given a new loan. We'll use a public bank transactions/loans dataset from the PKDD99 Challenge conference for the modeling. In the process we'll fit and explore the assumptions made for this model, and learn about some limitations of Scikit-Learn's tree-based models.
INTERACTIVE GPS DATA VISUALIZATIONS IN PYTHON/JUPYTER
Did you know you can plot your geographic data on interactive maps embedded directly in your Python notebooks? Check it out, as we play with and analyze some GPS tracking data. A database of tracked walking routes data available on a health/fitness website provides a convenient trove of data not only to play with, but also to explore the geometric interference effects of downtown buildings upon GPS track solutions.
RADIO SCIENCE GRAVITY INVERSION FOR ICY MOON INTERNAL STRUCTURE
The nature of an icy satellite's interior relates fundamentally to its composition, thermal structure, formation and evolution history, and prospects for supporting life. Gravity measurements via radio Doppler information during spacecraft flybys are an important tool used to infer gross interior structure of these moons. Liquid water and ice layers have previously been inferred for the interiors of Jupiter's icy satellites Europa, Ganymede, and Callisto on the basis of magnetic field measurements by the Galileo probe, and on Europa and Callisto induced magnetic field signatures measured by the Galileo probe provided strong evidence for an ionic aqueous ocean. We apply geophysical inverse theory tools to assess the icy moon's interior density anomaly distribution that could be estimated from radio Doppler measurements, to support the search for mass anomalies in the ice shell (meteorites or diapiric upwellings) or near the H2O/rock interface (seamounts).
FLOW_MODELS: IMAGE GENERATION AND ANOMALY DETECTION AS TWO SIDES OF SAME COIN.
Normalizing flow models are invertible neural networks, a type of generative model that offers a nice two-for-one benefit: simultaneously enabling unsupervised learning for image anomaly detection (by mapping unlabeled images to a distribution where statistical anomaly detection techniques can apply) and also enabling image simulation (by mapping randomly generated samples from a probability distribution into the image space). Now it may not be my ultimate use case, but it turns out there are, you know, a ton of cat images and datasets on the internet, so let's experiment with INNs and cats!
GETTING MLFLOW+DATABASE RUNNING QUICKLY VIA DOCKER
This provides a get-running-quickly Docker-compose setup using containers for MLflow, PostgreSQL, and NGINX. Run MLflow's database in PostgreSQL, and put an NGINX reverse proxy in front of the MLflow website to allow some level of access restriction (say for a workgroup within an already-firewalled company intranet).
DBSCAN CLUSTERING IN DECRYPTING AN IMAGE CYPHER
This wonderful kids' book series is fun not only for the stories themselves, but also because each of the first several books involves a cipher puzzle with "fairy hieroglyphics" - I love code puzzles! In the electronic form of the books I discovered the hieroglyphic sequence was moved to the back of the book, all perfectly lined up in matrices over a few pages at the end. And I thought, hey that seems like it'd be easy to parse and decrypt on a computer, just like the main character did!
ELECTROMAGNETIC INVERSION OF ESTUARINE SALINITY STRUCTURE USING SMALL-SCALE CSEM
The Conductivity Profiler is an instrument for remotely observing estuarine salinity profiles via electromagnetic measurements. Electromagnetic (EM) waves are attenuated in seawater as a function of frequency, and conductivity structure (closely related to salinity structure) in the water can be inferred by combining measurements of EM waves at different frequencies on a distant electric field receiver. Geophysical inversion methods are applied to estimate the estuarine salinity profile from the EM measurements. Using inverse theory techniques, we take advantage of statistical rigor and let the data determine the structure of the conductivity profile and quantify the uncertainty and resolution of the salinity profile.
MEDICAL IMAGE CLASSIFICATION BUILT WITH "MLFLOW PROJECTS"
Let's explore a supervised learning problem in medical imaging based on a public dataset and MLFlow's "Projects" functionality. A self-contained modeling module is trained, has its performance logged in MLFlow, and is able to be checked out as a deployable model image. There's a configurable implementation of this in my aganse/py_tf2_gpu_dock_mlflow repo. Let's try the malaria detection dataset from the Tensorflow datasets, which contains a balanced, labeled dataset of about 27,000 thin blood smear slide images of cells, and let's see how well we can detect malaria parasite presence in the images. This dataset is used to train/test different variations of image classification models, including VGG-16 and various sizes of more basic convolutional networks.
GPT_CLIENT CLI WITH PARAMETER CONTROL, WEBLINK SUBMISSION, & SYNTAX HIGHLIGHTING
I have found OpenAI's GPT models to be fabulously productive tools and use them often in my technical work now. But to get what I want out of the models for my uses has taken accessing the models from the API rather than the ChatGPT website GUI. This allows me to change some of the model parameters, format the output as I wish, and run the whole thing in my terminal. Of course the process of making the app has provided highly useful education in understanding how the models work as well, including how interacting with them via API can enable no end of use cases from other automated code.
PREDICTING BANK LOAN BEHAVIOR WITH RANDOM FOREST MODELS
Let's implement a random forest classifier from Scikit-Learn to see how well we can predict whether a bank client will have good loan behavior (meaning they won't default or become delinquent) if they are given a new loan. We'll use a public bank transactions/loans dataset from the PKDD99 Challenge conference for the modeling. In the process we'll fit and explore the assumptions made for this model, and learn about some limitations of Scikit-Learn's tree-based models.
INTERACTIVE GPS DATA VISUALIZATIONS IN PYTHON/JUPYTER
Did you know you can plot your geographic data on interactive maps embedded directly in your Python notebooks? Check it out, as we play with and analyze some GPS tracking data. A database of tracked walking routes data available on a health/fitness website provides a convenient trove of data not only to play with, but also to explore the geometric interference effects of downtown buildings upon GPS track solutions.
RADIO SCIENCE GRAVITY INVERSION FOR ICY MOON INTERNAL STRUCTURE
The nature of an icy satellite's interior relates fundamentally to its composition, thermal structure, formation and evolution history, and prospects for supporting life. Gravity measurements via radio Doppler information during spacecraft flybys are an important tool used to infer gross interior structure of these moons. Liquid water and ice layers have previously been inferred for the interiors of Jupiter's icy satellites Europa, Ganymede, and Callisto on the basis of magnetic field measurements by the Galileo probe, and on Europa and Callisto induced magnetic field signatures measured by the Galileo probe provided strong evidence for an ionic aqueous ocean. We apply geophysical inverse theory tools to assess the icy moon's interior density anomaly distribution that could be estimated from radio Doppler measurements, to support the search for mass anomalies in the ice shell (meteorites or diapiric upwellings) or near the H2O/rock interface (seamounts).