New York Yankees logo
27 Jun 2024

Data Engineer, Pipelines

🇺🇸 The Bronx, NY, USA
Full Time
3+ years exp.
US$75,000 – US$100,000 per year

DEPARTMENT: Baseball Operations

REPORTS TO: Director, Baseball Systems

JOB STATUS: Full-Time, Exempt

Description:

Built upon our storied legacy, the New York Yankees look to attract the best possible talent not just on the field but in the front office as well. It is our shared responsibility to maintain the first-class reputation associated with the franchise in all aspects of our business.

The New York Yankees Baseball Operations department is accepting applications for an experienced Data Engineer with a focus on building data processing pipelines for our analytical and scouting data. The Data Pipeline Engineer will be responsible for building automated data integration pipelines using Apache Spark in Databricks. This position will assist in the development and maintenance of our internal Baseball Operations application and data warehouses.

Primary Responsibilities:

  • Develop and maintain automated ETL pipelines for various game tracking system datasets.
  • Develops and maintain scalable data pipelines and build out new API and data source integrations to support continuing increases in data volume and complexity.
  • Maintain data warehouse environments and integrate various external and internal data sources to support analytical modeling.
  • Prepare, clean, format analytical datasets for processing by data scientists.
  • Conduct database feature engineering to support ongoing quantitative research.
  • Work with developers to create and deploy systems for anomaly detection.
  • Interface with data scientists, software developers, and other baseball operations staff as needed.

Qualifications and Experience:

  • Bachelor’s or master’s degree in computer science, Software Engineering, or a related field, or equivalent training or work experience.
  • 3+ years of experience developing in Python and a SQL-based RDMS
  • 3+ years of experience with developing ETL pipelines; demonstrating multiple techniques for integrating different APIs, file formats, and delivery mechanisms.
  • Must have experience with distributed computing concepts and frameworks (Spark, Hadoop, Kafka), experience with Apache Spark preferred.
  • Must have experience with data warehouse, data lake, and enterprise big data platforms (Databricks experience a plus).
  • Ability to write succinct code with optimal performance and simplicity.
  • Comfortable with agile delivery methodologies in a complex, fast-paced, development environment.
  • Excellent communication and problem-solving skills with the ability to break down complex tasks and put together an execution strategy with little guidance.
  • An understanding of typical baseball data structures, basic and advanced baseball metrics, and knowledge of current baseball research areas.


This description is intended to describe the type of work being performed by a person assigned to this position. It is not an exhaustive list of all duties and responsibilities required of the employee. The New York Yankees are an Equal Opportunity Employer. The Company is committed to the principles of equal employment opportunity for all employees and applicants for employment.

The base annual salary for this position is $75,000-$100,000, plus a comprehensive benefits package.

External Apply