Houston Astros banner
Houston Astros logo
21 Dec 2024

Data Engineer, Research & Development

πŸ‡ΊπŸ‡Έ Houston, TX, USA
Full Time
5+ years exp.

Department: Baseball Operations, Research and Development

Supervisor: Sr. Director, Research and Development

Classification: Full-Time/Exempt

Location: Houston, TX

The Houston Astros baseball organization is accepting applications for a Data Engineer to join our Research & Development team within Baseball Operations. We are seeking an applicant to support the growth of our data architecture using cloud-based data lake technologies. This role will work within a team of software developers supporting the broad need of Baseball Operations and will be central to the workflow of departments across the organization, including opportunities to interface with and understand the needs of other departments and drive creative solutions.

Essential Functions & Responsibilities

Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

  • Collaborate with the team on the design and implementation of a cloud-based data architecture.
  • Leverage Spark-based solutions to develop and maintain data processing pipelines that provide efficient access to data at various stages of transformation.
  • Integrate structured, semi-structured, and unstructured data sources, handling various formats including Parquet, JSON, and more.
  • Automate workflows and monitoring procedures to promote a maintainable infrastructure.
  • Write clean and iterative code and leverage continuous integration practices to deploy, support and operate data pipelines.
  • Interact with stakeholders internal to R&D (research analysts, application developers, ) and external to understand their needs from our architecture and data.
  • Participate in a rotating on-call schedule to tend to any immediate issues with our architecture and data.
  • Perform other duties as assigned.

Qualifications

  • Experience with one or more cloud platforms such as Azure, AWS, GCP.
  • Experience building and maintaining ETL processes with Databricks, Snowflake, or other data lake technologies.
  • Experience with Apache Spark (especially PySpark) a plus.
  • Proficiency with Python, including best practices and OOP design.
  • Proficiency with SQL and relational database structures.
  • Experience working on software teams and promoting software development best practices, including continuous integration, documentation, process automation, and monitoring.
  • Resilient in evolving environments and advocates for technical excellence.

Work Environment

This job operates in an office setting. This role routinely uses standard office equipment such as computers, phones and photocopiers. The noise level is usually moderate but can be loud within the stadium environment.

Physical Demands

While performing the duties of this job, the employee is occasionally required to stand; walk; sit (for long periods of time); use hands to handle or feel objects, tools or controls; reach with hands and arms; climb stairs; talk or hear. The employee may occasionally lift or move equipment, up to 20 pounds.

Specific vision abilities required by this job include close and focused vision.

Position Type and Expected Hours of Work

Ability to work a flexible schedule, including extended hours, evenings, weekends, and holidays.

Travel

Some travel may be expected in this role.

Other Duties

Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.

We are an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law.

EOE/M/F/Vet/Disability

Experience

Preferred
  • 5
External Apply