Andrew Vincent O'Connor
Senior Site Reliability Engineer | AI Security, Multi-Cloud & ML Infrastructure
andrewoconnor@outlook.com
301-624-9886
Summary
Senior Site Reliability Engineer focused on AI security, regulated ML infrastructure, and multi-cloud platform engineering. 10+ years in software and infrastructure engineering, with recent work securing PHI-processing medical AI systems across AWS and GCP in HIPAA, SOC 2, HITRUST, and FDA quality-controlled environments. Hands-on experience with private SageMaker inference, no-egress model containers, AI coding-agent guardrails, workload identity federation, CI/CD supply-chain hardening, vulnerability remediation, GPU training clusters, medical image de-identification, and production observability.
Experience
Senior Site Reliability Engineer
March 2024 - Present | Imagen Technologies

• Own secure deployment, reliability, performance, and cloud infrastructure for FDA-approved, AI-powered medical devices processing PHI in HIPAA, SOC 2, HITRUST, and FDA quality-controlled environments, partnering with SRE, AI, data, engineering, security, quality, and regulatory teams.

• Architected private-network SageMaker inference for a custom medical imaging foundation model that generates preliminary radiology report text from patient X-rays; secured 3 endpoints across 3 AWS accounts with PrivateLink, private subnets, no-egress security groups, fine-scoped Lambda invocation permissions, and KMS-encrypted ECR images.

• Productionized fine-tuned vLLM inference by packaging 100GB+ Docker images, publishing through GitHub Actions, and deploying isolated SageMaker model containers while removing public internet exposure for PHI-bearing inference traffic.

• Created first-of-kind AI coding-agent guardrails for Claude Code, GitHub Copilot, and Cursor in production Terraform and AWS Lambda repos, requiring read-only credentials, PR review, Atlantis delivery, and blocking secrets/PHI plus destructive Terraform and mutating AWS CLI actions.

• Built a GCP Dataflow pipeline to de-identify and catalog 1.67PB of medical images for AI training, removing DICOM metadata, burned-in PHI, and sensitive filenames while producing audit manifests, enforcing least-privilege worker IAM, using encrypted GCS, and running workers in private subnets; reduced run costs by ~75%.

• Provisioned secure 256-GPU NVIDIA H100 Slurm training infrastructure on GCP with private subnet isolation, firewall controls, encrypted access-controlled shared storage, access/job logging, and Terraform-managed infrastructure for production AI research workloads.

• Implemented GCP Workload Identity Federation for EKS-hosted Atlantis and Terraform deployments, avoiding long-lived service account keys, enforcing least-privilege cross-cloud role mappings, and introducing Terraform-based IaC management for GCP environments.

• Strengthened ML/cloud vulnerability and supply-chain security with AWS Inspector/ECR scans, Snyk, Trivy, Dependabot, Terraform/IaC scanning, CVE remediation, hardened AMIs/containers/EC2 instances, immutable patching pipelines, and SHA-pinned GitHub Actions.

• Applied AWS Bedrock Data Automation to extract text from image-based medical reports for downstream AI/analytics workflows, and led a Lambda-based inference initiative that improved scalability while reducing cost by 23% versus EC2.

Site Reliability Engineer
August 2022 - March 2024 | Imagen Technologies

• Created reusable Terraform modules adopted across 20+ AWS accounts to standardize CloudWatch dashboards, GitHub Actions OIDC federation, AWS Lambda infrastructure, and AWS Verified Access endpoints for the broader engineering organization.

• Led OIDC-based AWS authentication rollout across ~10 GitHub Actions repos, replacing stored AWS keys with short-lived federated credentials, repo/branch-restricted IAM trust policies, least-privilege access, branch protection, CODEOWNERS, required reviews, and separated Terraform plan/apply workflows.

• Implemented zero-trust access for sensitive internal applications using AWS Verified Access and Okta, replacing VPN-based access for Tableau and Terraform drift detection tools used by approximately 20–30 users.

• Built centralized observability, alerting, uptime dashboards, and incident runbooks for ~30 production systems across 3–4 teams using CloudWatch, Okta-secured access, and PagerDuty, improving incident visibility and MTTR.

• Supported security investigations by helping the security team locate and analyze relevant CloudWatch Insights, CloudTrail, AWS Security Hub, and production telemetry logs.

• Automated AWS operational workflows with Step Functions, Lambda, and Systems Manager for EC2 patching and production operations; extended Terraform CI/CD across AWS/GCP, modernized the EKS delivery platform to reduce cost by ~83%, and built immutable image pipelines for reliable releases and patch management.

Technical Consultant
February 2021 - August 2022 | Philips

• Installed and configured the PerformanceBridge radiology analytics platform and supporting systems across Linux and Windows Server environments for healthcare customers in multiple regions.

• Resolved complex customer concerns and technical issues through deep investigation, research, and reproduction.

• Automated recurring configuration tasks, created procedural documentation, and provided training to colleagues, improving operational consistency and onboarding.

• Supported product expansion into EMEA and APAC through client implementations and technical configuration for organizations including:

  • medneo GmbH
  • King Faisal Specialist Hospital Saudi Arabia
  • Chiba University Hospital
Software Engineer
March 2020 - February 2021 | Azenta / Brooks Life Sciences

• Built high-performance cloud-based applications for biobanking and laboratory automation.

• Won employee Key Strategy award for contributions to COVID-19 projects.

• Streamlined COVID-19 testing and reporting workflows, increasing efficiency and reducing errors for global biobanking processes.

• Designed a sample lineage interface for the UK NHS COVID-19 system, enabling the processing of hundreds of thousands of tests per week.

• Developed a sample normalization workflow for Curative Korva Labs, which conducted 20% of all tests in California, and integrated with state and local public health departments for reporting results.

Software Engineer
December 2017 - February 2020 | RURO Inc

• Built Rails-based software for clients including Roche and the National Institutes of Health.

• Reduced sample turnaround times for multiple clients from weeks to hours.

• Designed integrations with hardware devices and external systems including:

  • Scientific instruments (ASTM E1394)
  • Billing interfaces (HL7)
  • Payment systems (Stripe)
  • EHR/EMR systems (Epic)
  • Insurance preauthorizations
Senior Associate
June 2016 - December 2017 | Avalere Health

• Designed and built analytics tools for the post-acute care industry.

• Built web scrapers to collect plan data from state health insurance exchanges.

• Rebuilt a Tableau-based product using open-source software and on-demand cloud solutions, lowering costs by 50%.

Junior Software Engineer
June 2015 - May 2016 | RURO Inc

• Designed and developed a medical sample inventory application for an Android RFID reader.

• Integrated with hardware devices including:

  • Barcode readers
  • Signature pads
  • Robotic freezers
Education
Towson University
January 2010 - December 2012 | Towson, MD
Bachelor of Science in Computer Science
Skills
AI / ML Security & Infrastructure
AWS SageMaker, vLLM, private AI inference, no-egress model containers, Amazon ECR, AWS Bedrock Data Automation, Slurm, NVIDIA H100, GPU training infrastructure, medical imaging foundation models, de-identification pipelines, AI coding-agent guardrails
Cloud Security / Zero Trust
AWS PrivateLink, VPC endpoints, private subnets, security groups, IAM least privilege, KMS, AWS Verified Access, Okta, GCP Workload Identity Federation, cross-cloud role mappings, workload identity, access controls
Application / Supply Chain Security
AWS Inspector, ECR image scanning, Snyk, Trivy, Dependabot, Terraform/IaC scanning, CVE remediation, hardened Linux AMIs, hardened containers, hardened EC2 instances, immutable image pipelines, SHA-pinned GitHub Actions
Infrastructure as Code / CI/CD
Terraform, Atlantis, CloudFormation, Ansible, GitHub Actions, CodeBuild, EC2 Image Builder, CircleCI, OIDC federation, branch protection, CODEOWNERS, required reviews, multi-cloud infrastructure delivery
Observability / Incident Response
CloudWatch dashboards, CloudWatch Insights, CloudTrail, AWS Security Hub, PagerDuty, monitoring, alerting, uptime dashboards, incident response runbooks, production telemetry, reliability engineering
Backend / Data
Python, Ruby, PostgreSQL, Nginx, GCP Dataflow, GCS, AWS Lambda, Step Functions, Systems Manager, Docker, Linux
Healthcare / Compliance
HIPAA-regulated systems, SOC 2, HITRUST, FDA quality controls, PHI, medical imaging data, radiology workflows, clinical document processing, HL7, Epic, laboratory workflows
Certs
AWS Certified Solutions Architect - Associate
Projects
drumroll.world website
drumroll.world
Explore drumming history visually
Languages
English
Native
Spanish
Native