Infrastructure Engineer
Division: DATUM, Impac Exploration Services
Location: Oklahoma City (OK), Houston (TX)
Type: Full-Time
Build Infrastructure for the Next Generation of Industrial AI
We're looking for an infrastructure engineer who gets excited about making AI work in the real world—not just in pristine data centers.
You'll architect and build infrastructure that bridges the gap between cutting-edge ML research and production deployments. This isn't your typical DevOps role—you'll be creating novel architectures and solving challenges that sit at the intersection of high-performance computing, distributed systems, and industrial operations.
The Real Environment
You'll be designing and building from first principles, iterating rapidly based on what our researchers need and what reality demands. If you thrive when given a complex problem and the freedom to solve it your way, you'll love this.
We move fast. Ship fast. Learn fast. Your architecture sketch from Monday might be in production by Friday.
What You'll Own
•Novel infrastructure architectures that don't exist elsewhere
•Systems design from whiteboard to production deployment
•Platform decisions that shape how we scale
•Infrastructure that makes our data scientists dangerously productive
•The technical foundation for AI that works where others can't
•Building the playbook others will eventually copy
Technical Stack & Expertise
Hardware/Compute:
•NVIDIA GPUs (A100, H100, A6000) and their quirks
•GPU interconnects (NVLink, InfiniBand)
•Server platforms (Dell PowerEdge, HPE Apollo, Supermicro)
•Understanding of CUDA, memory hierarchies, and GPU optimization
Orchestration & Containers:
•Kubernetes in anger (not just tutorials)
•Container runtimes (Docker, containerd, CRI-O)
•Service mesh (Istio, Linkerd)
•Helm, Kustomize, or similar for deployment management
Infrastructure & Networking:
•Terraform, Ansible, or Pulumi for IaC
•BGP, VXLAN, and software-defined networking
•Load balancing at layer 4 and 7
•Storage solutions (Ceph, MinIO, NetApp)
ML Infrastructure:
•Kubeflow, MLflow, or similar ML platforms
•GPU scheduling (NVIDIA GPU Operator, MIG)
•Distributed training frameworks
•Model serving infrastructure (Triton, TorchServe)
You're Our Person If
•You see undefined requirements as creative freedom
•You've built infrastructure without Stack Overflow because no one's solved it yet
•"It's never been done" sounds like a challenge, not a warning
•You can move from architecture diagrams to kubectl commands
•Complex distributed systems are your canvas
•You can explain your choices without defaulting to "best practices"
Especially If
•You've built GPU clusters that actually stayed up
•You've created systems that surprised even you with what they could do
•You understand when to build vs. buy vs. fork
•You've made infrastructure decisions with incomplete information—and been right
•You can prototype in the morning and production-harden in the afternoon
•You've worked where "good enough" isn't
The Opportunity
This is a chance to build without bureaucracy. You'll:
•Define architectures that become the standard for industrial AI
•Work directly with ML researchers who push your systems to their limits
•Make decisions that would take months of committees elsewhere
•Build infrastructure that enables entirely new capabilities
•Create systems that work where cloud providers fear to tread
Why This Hits Different
•No legacy systems to maintain or migrate
•Budget to build right, not just cheap
•Direct line from your ideas to production
•Team that understands infrastructure enables everything else
•Problems that haven't been solved before
•Freedom to define how industrial AI infrastructure should work
Ready?
Show us infrastructure you've built that others said was impossible. Tell us about a time you threw out the playbook and built something better. Share your thoughts on where ML infrastructure is heading.
We're looking for builders who see constraints as design inspiration, not limitations.
We are not currently sponsoring visas or participating in CPT programs.