In this post, we will show you how to configure a Spring Batch job to read data from multiple CSV file and write into single csv file.
Project structure
This is a directory structure of the standard gradle project.
Project dependencies
task wrapper(type: Wrapper) {
gradleVersion = '3.2.1'
}
apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'org.springframework.boot'
sourceCompatibility = 1.8
repositories {
mavenLocal()
mavenCentral()
}
dependencies {
compileOnly('org.projectlombok:lombok:1.16.12')
compile('org.springframework.boot:spring-boot-starter-batch:1.5.2.RELEASE')
testCompile('org.springframework.boot:spring-boot-starter-test:1.5.2.RELEASE')
}
buildscript {
repositories {
mavenLocal()
jcenter()
}
dependencies {
classpath "org.springframework.boot:spring-boot-gradle-plugin:1.5.2.RELEASE"
}
}
application.properties file
#empty
Spring Batch Jobs
CSV files
1,facebook.com 2,yahoo.com 3,google.com
200,walkingtechie.blogspot.com 300,stackoverflow.com 400,oracle.com
999,eclipse.org 888,baidu.com 777,twitter.com
Create a job which will read from multiple CSV files and write into a single CSV file
package com.walking.techie.multiresource.jobs;
import com.walking.techie.multiresource.model.Domain;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.MultiResourceItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor;
import org.springframework.batch.item.file.transform.DelimitedLineAggregator;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;
import org.springframework.core.io.Resource;
@Configuration
@EnableBatchProcessing
public class ReadMultiFileJob {
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Value("csv/inputs/domain*.csv")
private Resource[] resources;
@Bean
public Job readFiles() {
return jobBuilderFactory.get("readFiles").incrementer(new RunIdIncrementer()).
flow(step1()).end().build();
}
@Bean
public Step step1() {
return stepBuilderFactory.get("step1").<Domain, Domain>chunk(10)
.reader(multiResourceItemReader()).writer(writer()).build();
}
@Bean
public MultiResourceItemReader<Domain> multiResourceItemReader() {
MultiResourceItemReader<Domain> resourceItemReader = new MultiResourceItemReader<Domain>();
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
@Bean
public FlatFileItemReader<Domain> reader() {
FlatFileItemReader<Domain> reader = new FlatFileItemReader<Domain>();
reader.setLineMapper(new DefaultLineMapper() {{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames(new String[]{"id", "domain"});
}});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Domain>() {{
setTargetType(Domain.class);
}});
}});
return reader;
}
@Bean
public FlatFileItemWriter<Domain> writer() {
FlatFileItemWriter<Domain> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("output/domain.all.csv"));
writer.setLineAggregator(new DelimitedLineAggregator<Domain>() {{
setDelimiter(",");
setFieldExtractor(new BeanWrapperFieldExtractor<Domain>() {{
setNames(new String[]{"id", "domain"});
}});
}});
return writer;
}
}
Map CSV file values to Domain object and write to one CSV file.
A Java model class
package com.walking.techie.multiresource.model;
import lombok.Data;
@Data
public class Domain {
int id;
String domain;
}
Run Application
package com.walking.techie;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration;
@SpringBootApplication(exclude = DataSourceAutoConfiguration.class)
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
Output
Output of the application will store in output/domain.all.csv
1,facebook.com 2,yahoo.com 3,google.com 200,walkingtechie.blogspot.com 300,stackoverflow.com 400,oracle.com 999,eclipse.org 888,baidu.com 777,twitter.com
Note : This code has been compiled and run on mac notebook and intellij IDEA.
@Value("csv/inputs/domain*.csv")
ReplyDeleteprivate Resource[] resources;
This above piece of code will not work and will throw an exception. It should get a String[] and need to iterate it and convert it to a Resource[]
@Value("csv/inputs/domain*.csv")
private String[] resources;
for (String file : resources) {
UrlResource urlResource = new UrlResource(file);
resourceList.add(urlResource);
}
code in article work as expected this is not required.
DeleteIt was throwing an error..John Doe
ReplyDeletecan you share your stack trace?
Delete