本文将讨论如何加快使用Spring Boot编写的应用程序将大量信息写入关系数据库的速度。一次写入大量行时,Hibernate一次插入一个行,如果有很多行,将导致大量等待。让我们考虑一个如何解决这个问题的案例。
我们使用Spring Boot应用程序。作为DBMS-> MS SQL Server,作为编程语言-Kotlin。当然,Java不会有任何区别。
我们需要编写的数据实体:
@Entity
@Table(schema = BaseEntity.schemaName, name = GoodsPrice.tableName)
data class GoodsPrice(
@Id
@Column(name = "GoodsPriceId")
@GeneratedValue(strategy = GenerationType.IDENTITY)
override val id: Long,
@Column(name = "GoodsId")
val goodsId: Long,
@Column(name = "Price")
val price: BigDecimal,
@Column(name = "PriceDate")
val priceDate: LocalDate
): BaseEntity(id) {
companion object {
const val tableName: String = "GoodsPrice"
}
}
SQL:
CREATE TABLE [dbo].[GoodsPrice](
[GoodsPriceId] [int] IDENTITY(1,1) NOT NULL,
[GoodsId] [int] NOT NULL,
[Price] [numeric](18, 2) NOT NULL,
[PriceDate] nvarchar(10) NOT NULL,
CONSTRAINT [PK_GoodsPrice] PRIMARY KEY(GoodsPriceId))
作为演示示例,我们假设我们需要分别记录20,000和50,000条记录。
让我们创建一个控制器,该控制器将生成数据并将其传输以进行记录和记录时间:
@RestController
@RequestMapping("/api")
class SaveDataController(private val goodsPriceService: GoodsPriceService) {
@PostMapping("/saveViaJPA")
fun saveDataViaJPA(@RequestParam count: Int) {
val timeStart = System.currentTimeMillis()
goodsPriceService.saveAll(prepareData(count))
val secSpent = (System.currentTimeMillis() - timeStart) / 60
logger.info("Seconds spent : $secSpent")
}
private fun prepareData(count: Int) : List<GoodsPrice> {
val prices = mutableListOf<GoodsPrice>()
for (i in 1..count) {
prices.add(GoodsPrice(
id = 0L,
priceDate = LocalDate.now().minusDays(i.toLong()),
goodsId = 1L,
price = BigDecimal.TEN
))
}
return prices
}
companion object {
private val logger = LoggerFactory.getLogger(SaveDataController::class.java)
}
}
我们还将创建用于写入数据的服务和一个商品库GoodsPriceRepository
@Service
class GoodsPriceService(
private val goodsPriceRepository: GoodsPriceRepository
) {
private val xmlMapper: XmlMapper = XmlMapper()
fun saveAll(prices: List<GoodsPrice>) {
goodsPriceRepository.saveAll(prices)
}
}
之后,我们将依次为20,000条记录和50,000条记录调用saveDataViaJPA方法。
安慰:
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
2020-11-10 19:11:58.886 INFO 10364 --- [ restartedMain] xmlsave.controller.SaveDataController : Seconds spent : 63
问题在于,Hibernate尝试在单独的查询中插入每一行,即20,000次。在我的机器上花了63秒。
对于50,000个条目166秒。
解决方案
可以做什么?主要思想是我们将通过缓冲区表进行写入:
@Entity
@Table(schema = BaseEntity.schemaName, name = SaveBuffer.tableName)
data class SaveBuffer(
@Id
@Column(name = "BufferId")
@GeneratedValue(strategy = GenerationType.IDENTITY)
override val id: Long,
@Column(name = "UUID")
val uuid: String,
@Column(name = "xmlData")
val xmlData: String
): BaseEntity(id) {
companion object {
const val tableName: String = "SaveBuffer"
}
}
数据库中表的SQL脚本
CREATE TABLE [dbo].[SaveBuffer](
[BufferId] [int] IDENTITY NOT NULL,
[UUID] [varchar](64) NOT NULL,
[xmlData] [xml] NULL,
CONSTRAINT [PK_SaveBuffer] PRIMARY KEY (BufferId))
向SaveDataController添加方法:
@PostMapping("/saveViaBuffer")
fun saveViaBuffer(@RequestParam count: Int) {
val timeStart = System.currentTimeMillis()
goodsPriceService.saveViaBuffer(prepareData(count))
val secSpent = (System.currentTimeMillis() - timeStart) / 60
logger.info("Seconds spent : $secSpent")
}
我们还向GoodsPriceService添加方法:
@Transactional
fun saveViaBuffer(prices: List<GoodsPrice>) {
val uuid = UUID.randomUUID().toString()
val values = prices.map {
BufferDTO(
goodsId = it.goodsId,
priceDate = it.priceDate.format(DateTimeFormatter.ISO_DATE),
price = it.price.stripTrailingZeros().toPlainString()
)
}
bufferRepository.save(
SaveBuffer(
id = 0L,
uuid = uuid,
xmlData = xmlMapper.writeValueAsString(values)
)
)
goodsPriceRepository.saveViaBuffer(uuid)
bufferRepository.deleteAllByUuid(uuid)
}
要进行写入,我们首先将生成一个唯一的uuid来区分我们正在写入的当前数据。接下来,我们使用xml形式的文本将数据写入创建的缓冲区。也就是说,不会有20,000个插入,而只有1个。
然后,我们使用一个查询(如Insert into…select)将数据从缓冲区传输到GoodsPrice表。
使用saveViaBuffer方法的GoodsPriceRepository:
@Repository
interface GoodsPriceRepository: JpaRepository<GoodsPrice, Long> {
@Modifying
@Query("""
insert into dbo.GoodsPrice(
GoodsId,
Price,
PriceDate
)
select res.*
from dbo.SaveBuffer buffer
cross apply(select temp.n.value('goodsId[1]', 'int') as GoodsId
, temp.n.value('price[1]', 'numeric(18, 2)') as Price
, temp.n.value('priceDate[1]', 'nvarchar(10)') as PriceDate
from buffer.xmlData.nodes('/ArrayList/item') temp(n)) res
where buffer.UUID = :uuid
""", nativeQuery = true)
fun saveViaBuffer(uuid: String)
}
最后,为了不将重复的信息存储在数据库中,我们通过uuid从缓冲区中删除数据。
让我们将20,000行和50,000行的saveViaBuffer方法称为:
Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate:
insert into dbo.GoodsPrice(
GoodsId,
Price,
PriceDate
)
select res.*
from dbo.SaveBuffer buffer
cross apply(select temp.n.value('goodsId[1]', 'int') as GoodsId
, temp.n.value('price[1]', 'numeric(18, 2)') as Price
, temp.n.value('priceDate[1]', 'nvarchar(10)') as PriceDate
from buffer.xmlData.nodes('/ArrayList/item') temp(n)) res
where buffer.UUID = ?
Hibernate: select savebuffer0_.BufferId as bufferid1_1_, savebuffer0_.UUID as uuid2_1_, savebuffer0_.xmlData as xmldata3_1_ from dbo.SaveBuffer savebuffer0_ where savebuffer0_.UUID=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
2020-11-10 20:01:58.788 INFO 7224 --- [ restartedMain] xmlsave.controller.SaveDataController : Seconds spent : 13
从结果中可以看到,我们大大加快了数据记录的速度。
对于20,000条记录,13秒是63。
对于50,000条记录,27秒是166。
链接到测试项目