GeologicAI is a fast-growing technology company developing and deploying exciting new technologies for the energy and mining sectors. We build innovative geological robots that scan rocks, train AI to analyze the scan data, and make groundbreaking software that makes all our results incredibly useful for finding and extracting natural resources. We are well-funded, growing rapidly, and looking for amazing people to join our team
We are hiring a mid- to senior-level DevOps specialist to take ownership of our build pipeline and deployment operations across a fleet of remote scanning/processing trailers that acquire and process large volumes of hyperspectral imaging data used in core mining operations. Each trailer contains a Windows-based high performance compute cluster and performs data processing and AI inferencing at scale.
This is a high impact role focused on improving robustness, deployment reliability, and operation across a fleet containing hundreds of compute nodes. You will take over day-to-day build pipelines, deployment operations to trailers and individual endpoints, and play a key role in improving automation from code to production.
The role requires a strong background in Windows administration, Python, deployment strategies for distributed systems, and system troubleshooting.
- Own and execute deployment operations across 100–150 on-prem Windows servers in multiple locations
- Support release flow through branch promotion / merge processes:
- development → staging
- staging → production
- Run and monitor deployments using custom Bitbucket pipelines
- Track deployment progress and outcomes across distributed server environments
- Identify areas of opportunity to enhance the build pipeline and collaborate with other team members to automate processes
- Investigate and patch failed deployments on affected servers
- Troubleshoot common deployment failures (service issues, permissions, path/config mismatches, host availability, etc.)
- Validate recovery results and confirm application/service health after patching
- Escalate recurring/systemic issues with clear evidence and documentation
- Improve system telemetry with a focus on operational reliability and fleet intelligence
- Identify servers that missed deployments because they were offline/unreachable during release windows
- Run manual deployment pipelines or recovery workflows when those servers return online
- Verify version consistency and deployment completion across all target servers
- Maintain clear records of delayed and catch-up deployments- Build and maintain scripts to reduce repetitive deployment and remediation work
- Create tooling for:
- pre-deployment checks
- post-deployment validation
- deployment status reporting
- patch queues / catch-up queues
- version consistency checks
- Improve script quality over time with:
- logging
- error handling
- parameterization
- safe execution patterns
- operator-friendly output
- Create and maintain runbooks / SOPs for standard deployments, patching, and recovery
- Document recurring deployment failure patterns and remediation steps
- Improve repeatability and reduce dependency on tribal knowledge
- Partner with DevOps and engineering teams to identify practical automation opportunities
What Success Looks Like (First 90 Days)
- You can independently run regular deployment operations using our current Bitbucket pipeline workflow
- You can reliably manage branch-promotion deployment execution (development→staging, staging→production)
- You can patch failed deployments and run catch-up deployments for offline servers with minimal supervision
- You create or improve runbooks for common deployment and recovery scenarios
- You begin automating repetitive deployment support tasks to reduce manual load on DevOps- 3+ years of hands-on systems administration experience (mid-to-senior candidates welcome)
- Strong Windows Server administration experience in production environments
- Strong Python scripting experience (required) for operational automation
- Experience supporting deployments across many servers (distributed/multi-site environments preferred)
- Strong troubleshooting skills for production systems and deployment failures
- Experience writing maintainable automation/scripts with logging and error handling
- Ability to work independently and reliably in a remote or hybrid setup
- Strong documentation habits (runbooks, procedures, troubleshooting notes)
- PowerShell scripting experience (strongly preferred in Windows environments)
- Experience with Bitbucket Pipelines (especially custom deployment workflows)
- Experience deploying and supporting Python-based applications/services
- Experience in on-prem server environments
- Experience handling partial deployment failures and offline nodes in distributed environments
- Familiarity with release branching workflows and deployment strategies in distributed systems
- Experience improving operational processes in legacy/manual deployment environments
- Experience with commercial DevOps tools for CI/CD, artifact management, containers, and MLOps
- Experience with containerization and Linux workloads
- Experience with Windows remote execution (e.g., PowerShell remoting / WinRM)
- Experience with deployment reporting or operational dashboards
- Experience with configuration standardization / drift detection
- Exposure to infrastructure automation tools (e.g., DSC, Ansible) in Windows environments
- Exposure to Infrastructure-as-Code (IAC) tools such as Terraform
- You are a hands-on operator who gets things done reliably
- You are comfortable working in imperfect systems and improving them incrementally
- You think in repeatable processes, not heroics
- You stay calm and methodical when troubleshooting failures across many servers
- You document what you do and make it easier for the next person
- You care about operational quality and helping teams move faster safely
This is a
high-impact rolewhere your work will directly improve deployment reliability and scalability. You will help create immediate operational leverage by reducing manual toil and freeing up DevOps capacity for strategic improvements.
If you want a role where your scripting and systems expertise will have visible, real-world impact across a large server fleet, this is it.
- Remote or Hybrid (Calgary-based preference or Canada-based candidates aligned with team working hours)
- No on-call requirement
Working at Enersoft you will enjoy the following benefits:
- A casual and fun work environment
- Extended health and dental benefits
- Flexible schedule and opportunities for remote work
- Free parking at the office
- Robots with lasers!
- Dental care
- Extended health care
- Vision care